eead-csic-compbio / metagenome_Pfam_score

Protocol for finding informative protein families and then using them to score metagenomic sets.
GNU General Public License v3.0
9 stars 7 forks source link

Easy way to add genes to the nitrogen pathway? #5

Open michoug opened 5 years ago

michoug commented 5 years ago

Hi, I'm wondering if there is an easy way to modify the nitrogen pathway to replace or add the amoA gene from the uncultured archaeon by the same gene from the type strain nitrosopumilus maritimus that will much more represent the pathway by archaea Thanks Greg

valdeanda commented 5 years ago

Hi Greg, Yes, thanks for asking that. We are working on the version 1.2 which has the option -custom. By using this option, MEBS is going to download the Pfam database so you can add all the pfams that you want in the mapping file

For example, If you want to analyze only AmoA from archaea I recommend you to modify the pfam2kegg.tab file in the custom directory as following

PFAM KO PATHWAY PATHWAY NAME PF12942 1 Ammonia monoxygenase Archaea AmoABC PF04744 1 Ammonia monoxygenase Archaea AmoABC PF04896 1 Ammonia monoxygenase Archaea AmoABC

However, the nitrogen cycle already have the Archaea AmoABC as pathway 26. https://github.com/eead-csic-compbio/metagenome_Pfam_score/blob/master/cycles/nitrogen/pfam2kegg.tab

Be aware that using the custom option will be useful to compute the completeness of those pathways but not the score, that has to be done using the advanced mode.

As soon as the -custom option is implemented I will let you know. Meanwhile, you can try to focus only on N pathway 26 and see if that works for you. Thanks Val

michoug commented 5 years ago

Hi Val, Thank you for the answer that will be indeed very useful. My issue as of now is that the gene for amoA that you choose for archaea (I found only one) doesn't appear to be a blast match to one of the main taxonomic group possessing this gene in the archaeal domain, aka Nitrosopumilus. Best Greg

valdeanda commented 5 years ago

Hi Craig The -custom option is already available in mebs (v1.2) Can you do pull and tell me how that goes?

Which Pfam is the one that you only found in Nitrosopumilis? Best Val

El vie., 8 feb. 2019 23:03, michoug notifications@github.com escribió:

Hi Val, Thank you for the answer that will be indeed very useful. My issue as of now is that the gene for amoA that you choose for archaea (I found only one) doesn't appear to be a blast match to one of the main taxonomic group possessing this gene in the archaeal domain, aka Nitrosopumilus. Best Greg

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/metagenome_Pfam_score/issues/5#issuecomment-462014145, or mute the thread https://github.com/notifications/unsubscribe-auth/AQEODiN_s6AHmX7vkpnb-3i1MtuBWSmiks5vLlaIgaJpZM4athqG .

michoug commented 5 years ago

Hi Val, So it's probably a confusion on my part the my_Pfam.nitrogen.hmm file contains the ones that I'm looking for. However in the nitrogen.fasta the amoA gene for archaea (tr|A0A023Q3R5|A0A023Q3R5_9ARCH Ammonia monooxygenase (Fragment) OS=uncultured archaeon GN=amoA PE=4 SV=1) doesn't blast to the main nitrosopumilus that I'm looking for, thus my confusion. Would it be possible to clarify the role of the nitrogen.fasta in the analysis, if any, as it's not so clear for me. Thanks for your help Best Greg

valdeanda commented 5 years ago

Hi Creg., Which protein family exactly are you looking for?. The fasta file of each cycle contains representative sequences, that at the end are used to obtain the protein families (Pfams), and then to compute the relative entropy and the score. If you are not interested in the score, use the custom option with the protein family that you want to analyze, it doesn't matter if is not in the fasta file because MEBS is going to look all the protein families in Pfam database and only display those in your mapping file. Let me know if that was helpful. P.D In the MEBS paper is described in Stage 1 the annotation of the sulfur genes, the paper for the rest of the cycles is not ready yet. :S https://academic.oup.com/gigascience/article/6/11/gix096/4561660 I can give you more information if need it.

Best Val

vrou1995 commented 4 years ago

Hi Val,

I was looking for the v1.2 version to install so that I could use the custom option but I've had no luck so far. Could you point me in the right direction?

Many thanks,

Vincent

valdeanda commented 4 years ago

Hi Vincent I haven't updated the readme I'm so sorry. I promise to do it soon. In the mebs release, I specified the directions to use a custom set of pfams. Here is the link https://github.com/valdeanda/mebs/releases Let me know if that works for you and if you have any problems I'm happy to help. Val

El dom., 23 feb. 2020 a las 5:12, vrou1995 (notifications@github.com) escribió:

Hi Val,

I was looking for the v1.2 version to install so that I could use the custom option but I've had no luck so far. Could you point me in the right direction?

Many thanks,

Vincent

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eead-csic-compbio/metagenome_Pfam_score/issues/5?email_source=notifications&email_token=AEAQ4DSBLIIUZWZFMIOCWU3REJKYXA5CNFSM4GVWDKDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMVY7MA#issuecomment-590057392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEAQ4DVNTXMQXEGVD46WGZDREJKYXANCNFSM4GVWDKDA .

--

Dr. Valerie De Anda Postdoctoral Researcher

The University of Texas at Austin | Marine Science Institute

750 Channel View Dr. | 78373 Port Aransas, Texas

Website: https://valdeanda.github.io/

Github: valdeanda https://github.com/valdeanda

Twitter: @val_deanda https://twitter.com/val_deanda