VGP / vgp-assembly

VGP repository for the genome assembly working group
Other
185 stars 51 forks source link

BLAST options error: File .fasta does not exist #82

Closed Xnisongurumayum closed 1 year ago

Xnisongurumayum commented 1 year ago

Hello, I want to use the mitoVGP assembly to extract the full mitogenome for a few bird species. While I was running mitoVGP , I encounter the same error while executing the following command: (as given in https://github.com/VGP/vgp-assembly/blob/master/mitoVGP/README.md): $./mitoVGP -a pacbio -s Mastacembelus_armatus -i fMasArm1 -r mtDNA_Mastacembelus_armatus.fasta -t 24 -b variantCaller

(command with bam data at the local disk): $./mitoVGP -a pacbio -s Gallus_gallus -i bGalGal1 -r ./bGalGal1.MT.20191002.fasta -t 18 -1 /DATA2/rawdata/pacbio/PAPH_A_m64292e_230113_084435.subreads.bam -b variantCaller ERROR: New DB title: Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates/canu/fMasArm1.contigs.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B BLAST options error: File Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates/canu/fMasArm1.contigs.fasta does not exist

ERROR: New DB name: /DATA2/black_francolin_analysis/software/mitoVGP/Gallus_gallus/bGalGal1/assembly_MT_rockefeller/intermediates/blast/bGalGal1.db New DB title: Gallus_gallus/bGalGal1/assembly_MT_rockefeller/intermediates/canu/bGalGal1.contigs.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B BLAST options error: File Gallus_gallus/bGalGal1/assembly_MT_rockefeller/intermediates/canu/bGalGal1.contigs.fasta does not exist

Please help me resolve this and assemble the mitogenome, thank you so much!

gf777 commented 1 year ago

Hi @Xnisongurumayum

Thanks for reaching out. So if I understand correctly your test job with fMasArm1 is also failing. Could you please share your log file for the main job that includes the Canu step? It should be something like this:

Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates/log/fMasArm1_mtDNApipe_20200624-212549.out

Best

Xnisongurumayum commented 1 year ago

Hello @gf777 I have attached the log files generated in the directory: 1.Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates/log/fMasArm1_mtDNApipe_20200624-212549.out 2.Gallus_gallus/bGalGal1/assembly_MT_rockefeller/intermediates/log/bGalGal1_mtDNApipe_20230322-200511.out bGalGal1_mtDNApipe_20230322-200511.zip fMasArm1_mtDNApipe_20230323-130039.zip

Thank you for your time.

Best regards

gf777 commented 1 year ago

Hi @Xnisongurumayum These log files are almost completely empty. Are you sure you haven't truncated them somehow? For reference, the fMasArm1 file should look like the one attached.

Best

fMasArm1_mtDNApipe_20200624-212549.out.zip

Xnisongurumayum commented 1 year ago

Dear Giulio, Thank you for the fast reply. I have not done anything with my original data set besides running the default command. I am running the example command on another system but it gives me a new error (below): ++++ running: trimmer2 ++++

Species: -s Mastacembelus_armatus

Species ID: -i fMasArm1

Contig number: -n tig00000001

Number of threads: -t 50

Working directory: Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates

cat: Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates/freebayes_round2/fMasArm1.tig00000001_polish2_10x1_trim1_10x2.fasta: No such file or directory

++++ running: linearizePhe ++++

Species: -s Mastacembelus_armatus

Species ID: -i fMasArm1

Contig number: -n tig00000001

Number of threads: -t 50

Working directory: Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates

--Annotating tRNAs:

Phenylalanine coordinates:

Also, here are the list of log files, I have after this error: fMasArm1_blastMT_20230324-130358.out fMasArm1_map10x1_20230324-182720.out fMasArm1_mtDNApipe_20230324-140629.out fMasArm1_blastMT_20230324-140341.out fMasArm1_map10x2._20230324-140341.out fMasArm1_mtDNApipe_20230324-182720.out fMasArm1_blastMT_20230324-140629.out fMasArm1_map10x2._20230324-161900.out fMasArm1_trimmer_20230324-140341.out fMasArm1_blastMT_20230324-182720.out fMasArm1_map10x2._20230324-182720.out fMasArm1_trimmer_20230324-161825.out fMasArm1_linearizePhe_20230324-140341.out fMasArm1_mitoPolish_20230324-130400.out fMasArm1_trimmer_20230324-182720.out fMasArm1_linearizePhe_20230324-161900.out fMasArm1_mitoPolish_20230324-140341.out fMasArm1_trimmer2_20230324-140341.out fMasArm1_linearizePhe_20230324-182720.out fMasArm1_mitoPolish_20230324-140629.out fMasArm1_trimmer2_20230324-161900.out fMasArm1_map10x1_20230324-130421.out fMasArm1_mitoPolish_20230324-182720.out fMasArm1_trimmer2_20230324-182720.out fMasArm1_map10x1_20230324-140341.out fMasArm1_mtDNApipe_20230324-113717.out long_reads_file_list_20230324-113717.txt fMasArm1_map10x1_20230324-140629.out fMasArm1_mtDNApipe_20230324-140341.out

Below are the directories inside the intermediates directory: blast bowtie2_round2 freebayes_round1 linearizePhe polish tgs_bam trimmed bowtie2_round1 canu freebayes_round2 log reference tgs_MT_extracted_reads

Please let me know what I have to do to resolve this error.

Thank you

Best

On Fri, 24 Mar 2023 at 18:04, Giulio Formenti @.***> wrote:

Hi @Xnisongurumayum https://github.com/Xnisongurumayum These log files are almost completely empty. Are you sure you haven't truncated them somehow? For reference, the fMasArm1 file should look like the one attached.

Best

fMasArm1_mtDNApipe_20200624-212549.out.zip https://github.com/VGP/vgp-assembly/files/11062068/fMasArm1_mtDNApipe_20200624-212549.out.zip

— Reply to this email directly, view it on GitHub https://github.com/VGP/vgp-assembly/issues/82#issuecomment-1482729761, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4A55WKT53TIZGU34NJTLWTW5WIG3ANCNFSM6AAAAAAWE2NZSE . You are receiving this because you were mentioned.Message ID: @.***>

gf777 commented 1 year ago

Hi @Xnisongurumayum

To me all these errors are really strange. I'll try to run the test example again locally to see if anything changed. Is it possible that you haven't deleted the folder and initially ran into trouble (there are too many log files)?

Xnisongurumayum commented 1 year ago

Hi @gf777 , I did delete the folder generated (Mastacembelus_armatus). I will put another run with the example set and send you the log files generated. Thank you. Best

gf777 commented 1 year ago

Hi @Xnisongurumayum

I did a dry run myself and indeed some few things were broken as a consequence of changes in modules here and there. It should be fixed now (make sure you use the main repo: https://github.com/gf777/mitoVGP). It worked for me.

Note that now apparently a higher version of numpy (1.16.6) needs to be installed for variantCaller to work. Reinstall the conda environment or try pip install numpy --upgrade

Finally note that you need also short reads for mitoVGP to work (at the later stages).

Thanks for reporting this bug!

Best

Xnisongurumayum commented 1 year ago

Hi @gf777 , Thank you very much. I have updated the numpy ($ pip install numpy --upgrade). It worked on the example set. I got no error this time and completed the run successfully. Is the assembled mitogenome the "Final sequence:

fMasArm1 Mitogenome size: 16490 bp" located in the Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates/log/fMasArm1_linearizePhe_20230327-174449.out?

Also, what can I do if I do not have short reads? Please let me know. I'm very sorry for the inconvenience.

Thank you

Best regards,

gf777 commented 1 year ago

Hi @Xnisongurumayum

No the final assembly will be under: Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/fMasArm1.MT.20230326.fasta.gz

Illumina data is used for polishing (see the flowchart in the github or in the paper). If you don't have it, the pipeline will essentially stop after the long-read assembly and polishing. Depending on the long read coverage available it might be significantly lower quality or not. You will also have to trim the sequence manually (being circular, it will usually have several overlaps..you can check these and fine where to trim by self-alignment).

Hope this helps

Xnisongurumayum commented 1 year ago

Hello @gf777 Thank you for your help. I have successfully assembled the mitogenome (ran multiple times; just to make sure that I don't encounter any errors again).

Best

gf777 commented 1 year ago

Hi @Xnisongurumayum

Glad I was of assistance!

All the best