HadrienG / InSilicoSeq

:rocket: A sequencing simulator
https://insilicoseq.readthedocs.io
MIT License
176 stars 32 forks source link

Kingdom limitation #257

Closed TheLokj closed 3 months ago

TheLokj commented 3 months ago

Hi !

As the NCBI Entrez service allows the request to eukaryotes data with the parameter eukaryota[Organism], I was wondering why does InSilicoSeq (and more specifically app.py) only allow the use of the kingdoms bacteria, viruses and archaea ?

I'm new into bioinformatics so there is probably a reason behind but I prefer to be sure.

HadrienG commented 3 months ago

Hi!

InsilicoSeq was developed primarily for metagenomics, where you usually (there are some exceptions) do not want eukaryotic DNA in your samples.

Most often, eukaryotic DNA is still present if you are taking samples from a host, but this host DNA is often discarded before downstream analysis. For users who would like to include host DNA somehow, they'll want a specific organism and therefore use their own abundance file to control what species they simulate. I don't see many users needing to simulate reads from random eukaryotic organisms.

/Hadrien

TheLokj commented 3 months ago

Hi, thanks for your answer !

Okay so that's a just a personal choice during the development of InSilicoSeq. I guess your point of view is probably true when someone is doing health metagenomics studies. I was more thinking of ecological metagenomics studies, where you definitively want also eukaryotic DNA in your samples, whether host DNA in the case of intraorganism study or general eukaryotic DNA (i.e. algae, fungi, protozoans or even if some rare case metazoan) in the case of interorganism studies like with marines samples for example. But you're definitively true, this field of study is still underdeveloped :)