Closed TanyaC505 closed 1 week ago
Hi @TanyaC505, thank you for using BGCFlow :)
Can you also share the log files here so we can debug the issue together? The log files can be found in these locations:
/home/mk-mica-minion/Desktop/Tanya/Rhodococcus/Thirdtry/bgcflow/logs/bigscape/get_mibig_table.log
/home/mk-mica-minion/Desktop/Tanya/Rhodococcus/Thirdtry/bgcflow/logs/bigscape/Rhodococcus_antismash_7.1.0/bigscape.log
Hi Matin,
Thanks so much for the help :)
Please see the attached logs.
Kind regards
Tanya
On Thu, 5 Sept 2024 at 13:00, Matin Nuhamunada @.***> wrote:
Hi @TanyaC505 https://github.com/TanyaC505, thank you for using BGCFlow :)
Can you also share the log files here so we can debug the issue together? The log files can be found in these locations:
- logs/bigscape/get_mibig_table.log
- logs/bigscape/Rhodococcus_antismash_7.1.0/bigscape.log
— Reply to this email directly, view it on GitHub https://github.com/NBChub/bgcflow/issues/355#issuecomment-2331228009, or unsubscribe https://github.com/notifications/unsubscribe-auth/BLBIO4EN7QIMB57VBQEQ5CDZVA2TJAVCNFSM6AAAAABNV74KIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZRGIZDQMBQHE . You are receiving this because you were mentioned.Message ID: @.***>
I see, for your first issue, the MIBiG download, it seems that the previous download is interrupted halfway, and the file is corrupted. I will improve the script to automatically detect this, but for now, try to remove the MIBiG file in your resources folder and then try running it again:
rm -rf /home/mk-mica-minion/Desktop/Tanya/Rhodococcus/Thirdtry/bgcflow/resources/mibig*
For the second issue, seems like there is an error with the libgfortran library:
File "/home/mk-mica-minion/Desktop/Tanya/Rhodococcus/Thirdtry/bgcflow/.snakemake/conda/1cb315120f65f8ad51e3c6450bedf9ee_/lib/python3.6/site-packages/sklearn/metrics/pairwise.py", line 30, in <module>
from .pairwise_fast import _chi2_kernel_fast, _sparse_manhattan
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Can you tell me which Linux/Ubuntu version are you using? I cannot reproduce the issue you're facing, but I can suggest to maybe check and install the latest gcc: https://github.com/NBChub/bgcflow/wiki/00-Installation-Guide#gcc-compiler
Ah, actually you're right, there is an issue with the installation script for BiG-SCAPE. I found the same problem on the missing dependencies. Will check what can be done.
I fixed the script by pinning libgfortran to version 3.0.0. I will perform tests before incorporating the changes to the main branch. For now, you can try to run the updated BGCFlow by following this step:
git fetch # this will check if there are updates in the online repository
git pull # this will pull the update to the local directory
git checkout dev-1.1.1 # switch bgcflow from the main branch to the development branch
Then you should be able to run bgcflow normally. Let me know if this fixes your issue with bigscape.
PS: I will let you know when I finished the test, then you can go back to the main branch by:
git fetch
git pull
git checkout main
Thank you so much, I incorporated the above as suggested and there is a new error once running bgcflow: LockException: Error: Directory cannot be locked. Please make sure that no other Snakemake process is trying to create the same files in the following directory: /home/mk-mica-minion/Desktop/Tanya/Rhodococcus/Thirdtry/bgcflow If you are sure that no other instances of snakemake are running on this directory, the remaining lock was likely caused by a kill signal or a power loss. It can be removed with the --unlock argument.
Ah, that is a common error when the previous run is terminated forcefully (see https://snakemake.readthedocs.io/en/stable/project_info/faq.html#id30).
As instructed in the message, you can unlock the snakemake directory with:
bgcflow run --unlock
Then you should be able to run bgcflow normally again.
You can see the full command by typing bgcflow run --help
Thank you so much for all your help - bgcflow & report building worked successfully!
Hi Matin, although the bgcflow and report run successfully, for some reason not all the antismash results were incorporated into the big-scape network. For example, 218 BGCs from 14 genomes were detected using antismash, but BIG-SCAPE only incorporated 21 BGCs into the network and data processing. Do you perhaps have any recommendation on how I could perhaps fix this? Thank you.
Can you send me the log files for the BIGSCAPE run?
I see. Are you running your own genome sequences? What input file type are you using? Genbank or Fasta? You might want to make sure that your sequence accessions are unique.
I can see in the log run that for each genome, the sequence accession is named chromosome00001
etc.
What happen is that BiG-SCAPE detected 218 BGCs, but because some of them have redundant names like chromosome00001.region012
, chromosome00001.region016
, etc, BiG-SCAPE assumed it as duplicates.
....
File data/interim/bigscape/Rhodococcus_antismash_7.1.0/cache/fasta/chromosome00001.region012.fasta already processed
Adding chromosome00001.region012.gbk (50590 bps)
File data/interim/bigscape/Rhodococcus_antismash_7.1.0/cache/fasta/chromosome00001.region016.fasta already processed
Adding chromosome00001.region016.gbk (49828 bps)
Starting with 218 files
Files that had its sequence extracted: 22
...
What I would suggest is to rename the sequence accession in the Fasta files (or genbanks) and make it unique. For example:, if this is your original sequence fasta files:
rhodococcus_strain01.fasta
>chromosome00001
CGATGGTACA....
>chromosome00002
You can replace it by adding the genome ids into the sequence accession:
rhodococcus_strain01.fasta
>rhodococcus_strain01__chromosome00001
CGATGGTACA....
>rhodococcus_strain01__chromosome00002
Ideally, you will get this unique identifier when submitting your sequences to a repository such as NCBI. You can of course came up with any unique identifier.
This of course mean that you need to re-run the whole workflow.
I will add this to the FAQ list.
I have my own genome sequences that are in fasta format. I have renamed the sequence accession within each file and rerun the bgcflow. I am confident this will solve the issue. Thanks so much for your help!
Sounds great! 👍
I suggest to remove the previous data/interim folder (or even the whole data folder) to make sure everything is correct
@TanyaC505 I've merged the update to the main branch (74666bb2730951b772b92733830c8643dc621963) so you can now switch back using git checkout main
.
Thanks again for the feedback and improving BGCFlow!
Any recommendations on possible solution to the below errors that occurred whilst running bgcflow would be greatly appreciated. Thanks
Error in rule get_mibig_table: jobid: 173 output: resources/mibig/json, resources/mibig/df_mibig_bgcs.csv log: logs/bigscape/get_mibigtable.log (check log file(s) for error details) conda-env: /home/mk-mica-minion/Desktop/Tanya/Rhodococcus/Thirdtry/bgcflow/.snakemake/conda/61b5332396c2d7bb1ce5092174e049db shell:
Error in rule bigscape: jobid: 170 input: resources/BiG-SCAPE, data/interim/bgcs/Rhodococcus/Rhodococcus_antismash_7.1.0.csv, data/interim/bgcs/Rhodococcus/7.1.0 output: data/interim/bigscape/Rhodococcus_antismash_7.1.0/index.html log: logs/bigscape/Rhodococcus_antismash7.1.0/bigscape.log (check log file(s) for error details) conda-env: /home/mk-mica-minion/Desktop/Tanya/Rhodococcus/Thirdtry/bgcflow/.snakemake/conda/1cb315120f65f8ad51e3c6450bedf9ee shell: