Open drelo opened 3 years ago
Hi @drelo yes we are also using DeepBGC on metagenomic samples. Generally the longer your sequence, the better. You can use --prodigal-meta-mode
to run Prodigal in '-p meta' mode to enable detecting more genes in short contigs.
The warnings should not be related. Can you check what you get in the output *.pfam.tsv
file? Do you get any protein domains? There's also a deepbgc_score
column that gives you a BGC probability for each protein domain.
If there are some protein domain hits, you can also run deepbgc with a lower --score
threshold to change the BGC cutoff - you should be able to check evaluation/*.score.png
to see which regions would become BGCs if you chose a lower threshold.
Thanks for your reply, I got no .tsv file and no files in the evaluation folder. Do I need to process the multifasta from SPAdes in order to provide it to deepBGC? What seems odd is the fasta file is +600 Mb and it is processed really quick. Thanks for your help!
El sáb, 28 nov 2020 a las 8:59, David Příhoda (notifications@github.com) escribió:
Hi @drelo https://github.com/drelo yes we are also using DeepBGC on metagenomic samples. Generally the longer your sequence, the better.
The warnings should not be related. Can you check what you get in the output *.pfam.tsv file? Do you get any protein domains? There's also a deepbgc_score column that gives you a BGC probability for each protein domain.
If there are some protein domain hits, you can also run deepbgc with a lower --threshold to change the BGC cutoff - you should be able to check evaluation/*.score.png to see which regions would become BGCs if you chose a lower threshold.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Merck/deepbgc/issues/43#issuecomment-735222430, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO2NCUKDFPGYI3UICFC2LLSSDQ2FANCNFSM4UEEZSFA .
That sounds suspicious indeed. Can you try running deepbgc with the SPAdes contigs.fa
file instead of the scaffolds file and adding the deepbgc --prodigal-meta-mode
flag? If that still fails, it would be great if I could see one of the sequences in that FASTA file.
I found the error here, I am working at 2 clusters and by mistake I copied a file that had badly parsed fasta headers. Now it is running smoothly.
A follow up question (or let me know if I should start a new issue) is there a way to combine results from +1 sample (from similar environment, nearby area, etc) ?
Thanks for your help
El sáb, 28 nov 2020 a las 16:13, David Příhoda (notifications@github.com) escribió:
That sounds suspicious indeed. Can you try running deepbgc with the SPAdes contigs.fa file instead of the scaffolds file and adding the --prodigal-meta-mode flag? If that still fails, it would be great if I could see one of the sequences in that FASTA file.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Merck/deepbgc/issues/43#issuecomment-735278531, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO2NCTS4OVIZJTIH4MEXDTSSFDVPANCNFSM4UEEZSFA .
Great. What exactly do you mean by combining results?
For example there are 2 samples from the same place but collected at a different time that I would like to combine to just have a glimpse of the diversity at that site (regardless the temporal dimension) or combine several samples from similar environments ('pooling' urban or rural). Now with the results that are still accumulating I noticed there is an output as tsv so I think I could just parse/merge them.
Best,
Andres
El dom, 29 nov 2020 a las 10:51, David Příhoda (notifications@github.com) escribió:
Great. What exactly do you mean by combining results?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Merck/deepbgc/issues/43#issuecomment-735398094, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO2NCSM5W7LXECE4JDBAC3SSJGWHANCNFSM4UEEZSFA .
Exactly, you can merge the TSV files (there's a BGC-level TSV and a protein domain-level TSV) or the genbank files.
There's also a recent paper that introduces a method for visualizing BGCs called BGCViz, so you could give that a shot: https://github.com/pavlohrab/BGCViz or the web interface https://biopavlohrab.shinyapps.io/BGCViz/
BGCViz is relevant if you are also analysing your samples with other tools like antiSMASH.
Thanks for all your help, I didn't know about BGCViz so that will be my next step in the exploration.
Cheers
El lun, 30 nov 2020 a las 4:43, David Příhoda (notifications@github.com) escribió:
BGCViz is relevant if you are also analysing your samples with other tools like antiSMASH.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Merck/deepbgc/issues/43#issuecomment-735612589, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACO2NCQLKXXVAHC5GUKTFITSSNEJ3ANCNFSM4UEEZSFA .
Dear users, I wonder if I can use deepBGC with metagenomic samples? In the paper describing the software it is mentioned as a useful tool for this kind of data but I don't know if it is implemented in the current version. I run a test with a sample (CPB-18) which is the scaffold file obtained from SPAdes and it quickly returned 0 matches I don't understand if this is a matter of the format I used or something else. This same file returned several matches or bgc with antiSMASH.
I noticed these lines while running it
/mnt/ubi/andres/miniconda3/envs/deepbio/lib/python3.7/site-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.tree.tree module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.tree. Anything that cannot be imported from sklearn.tree is now part of the private API. warnings.warn(message, FutureWarning) /mnt/ubi/andres/miniconda3/envs/deepbio/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.18.2 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk. UserWarning) /mnt/ubi/andres/miniconda3/envs/deepbio/lib/python3.7/site-packages/sklearn/base.py:334: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 0.18.2 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
Before that I run the BGC sample included within the test folder and I obtained 2 hits as seen in the log attached here (BGC15 file).
Maybe I have a broken install of the program, I followed the conda instructions. Please find attached the log from deepbgc info too.
pipeinfo.txt
BGC15.txt sample.txt
Thanks for your help.