NCBI-Hackathons / GeneHummus

An Automated Pipeline to Classify Gene Families based on Protein Domain Organization using Auxin Response Factors in Legumes as an Example
Other
7 stars 2 forks source link

Info for non-model organism species #7

Closed KristinaGagalova closed 4 years ago

KristinaGagalova commented 4 years ago

Hi,

I have a question about non-model organism species. I was wondering if there is a way to combine newly discovered genes from non-model species to the ones that already have an ids and are in NCBI. I believe the tool is mainly applied to species that are re-sequenced as in the case of population genetics for example. Any chance to have it working with other species?

Thank you in advance for the future reply

Kristina

jdieramon commented 4 years ago

Hi Kristina, As the pipeline is based mainly on RefSeq database , your organism needs to have a taxonomy id and, to best performance, it requires a reference sequence of transcripts, or/and proteins. Do you have any particular organism in mind ?

KristinaGagalova commented 4 years ago

Hi, Thank you for the fast reply. My organism actually has a taxonomy id. I am working with spruce.

The major problem is that we don't have a reference proteome/transcriptome (The annotation that i run will be the first one deposited on NCBI so far, excluding the organellar genomes). Screenshot from 2019-12-23 11-01-59

How is the performance without a reference set of sequences?

jdieramon commented 4 years ago

Hi Kristina, I have checked the databases and , exactly right as you said, the protein database contains only entries from chloroplast and plastid. Only 74 are hosted by RefSeq. So, unfortunately it won't work at this stage. However, let us know as soon as you upload your annotation and we 'll run a demo for your sequences.

KristinaGagalova commented 4 years ago

Sounds good! Thank you for the help