Niloofar-Alaei commented 3 years ago

Hi,

I want to use the mitoVGP assembly to extract the full mitogenome for a group of birds. I have reference genome from the close species and also as you mention in your paper, I have the Pacbio (number of reads: 386038492, and length of reads: 53852369634) and 10x data (for R1, number of reads: 193019246, and length of reads: 29145906146 and, also the same for R2) for the one sample too. Then I used this command to run the mitoVGP for my dataset:

./mitoVGP \ -a pacbio \ -s Taeniopygia_guttata \ -i bTaeGut2 \ -r data_REF.fasta \ -t ${NSLOTS:-1} \ -1 $my_PACBIO_data \ -2 $my_10xdata_R1 $ my_10xdata_R2 \ -b variantCaller

The first question, I am not pretty sure that this way that I used to introduce my dataset (10x data and PACBIO data) is correct or not?

And then it was running for more than 3 weeks, and the times that I set for this analyses was finished and it was stopped without any result.

I am wondering to know, normally how much time does this analyses need? And do you have any suggestion that help me to run this analyses?

I am looking forward to hearing from you

The best Niloo

gf777 commented 3 years ago

Dear Niloo,

sorry to hear that. I think we can easily debug your issue. First of all, the pipeline shouldn't take more than a few hours to run depending on the number of cpus available, and in many cases it could take maybe 20 minutes. You should have several log files where I could then identify the issue. Can you post the main log file (i.e. the stdout)?

thanks

Giulio

Niloofar-Alaei commented 3 years ago

Hi dear Giulio

Thanks alot for your email, I have one folder with name log that included these 5 files: bTaeGut2_blastMT_20200911-115612.out bTaeGut2_mtDNApipe_20200810-160212.out bTaeGut2_mtDNApipe_20200911-115612.out long_reads_file_list_20200810-160212.txt long_reads_file_list_20200911-115612.txt

that the last one is also empty. then whould you please let me know,
which file do you need, then I can send you and also, I can send you
my script too, maybe I make a mistake in it.

With the best Niloo

Quoting Giulio Formenti notifications@github.com:

Dear Niloo,

sorry to hear that. I think we can easily debug your issue. First of
all, the pipeline shouldn't take more than a few hours to run
depending on the number of cpus available, and in many cases it
could take maybe 20 minutes. You should have several log files where
I could then identify the issue. Can you post the main log file
(i.e. the stdout)?

thanks

Giulio

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/VGP/vgp-assembly/issues/44#issuecomment-691228223

gf777 commented 3 years ago

Hi Niloo, just copypaste them all. The fact the long read files are empty is the most likely explanation, no read was loaded and the pipeline immediately failed. Is -1 $my_PACBIO_data variable correctly set? It should point to a file with the absolute paths of your pacbio bam files

Niloofar-Alaei commented 3 years ago

Hi Giulio

Ok, maybe I get my problem ;).

I addressed these two variables:

-1 $pacbio \ -2 $R1 $R2 \ to fastq files of PACBIO and 10x data, respectively.

Should I used the bam files for both of them?

The best Niloo

Quoting Giulio Formenti notifications@github.com:

Hi Niloo, just copypaste them all. The fact the long read files are
empty is the most likely explanation, no read was loaded and the
pipeline immediately failed. Is -1 $my_PACBIO_data variable
correctly set? It should point to a file with the absolute paths of
your pacbio bam files

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/VGP/vgp-assembly/issues/44#issuecomment-692099395

gf777 commented 3 years ago

Hi Nilo, the two files you need to give are simple text files with a list of the absolute paths to your .bam files in the case of pacbio, and fastq in the case of illumina. You cannot use fastq files for pacbio as the pipeline will break at the arrow polishing step. Also -2, is a single list in a file. Please check the exact requirements of these two files (-1, -2) using mitoVGP -h

Best

Niloofar-Alaei commented 3 years ago

Hi dear Giulio Thanks a lot for your help, then I run it again with this correct setting

The best Niloo Quoting Giulio Formenti notifications@github.com:

Hi Nilo, the two files you need to give are simple text files with a
list of the absolute paths to your .bam files in the case of pacbio,
and fastq in the case of illumina. You cannot use fastq files for
pacbio as the pipeline will break at the arrow polishing step. Also
-2, is a single list in a file. Please check the exact requirements
of these two files (-1, -2) using mitoVGP -h

Best

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/VGP/vgp-assembly/issues/44#issuecomment-692149355

gf777 commented 3 years ago

you are very welcome, let me know how it goes!

Niloofar-Alaei commented 3 years ago

Dear Giulio

I run it based on your help and it runs :)

For -1 and -2, I used the text files that included the exact address
of my 18 bam files of pacbio, and two fastq files R1 and R2 of 10x
data, respectively.

After around 20min, I receive this error message:

Align... Error: could not open
Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1_trim1.fasta Error: Encountered internal Bowtie 2 exception (#1)

And when I see this address:
Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/,
there is no file with name
bTaeGut2.tig00000001_polish2_10x1_trim1.fasta (I have the same name
but with format .delta)

I checked that there are 10 log files that I attached all of them,
please find them

I am really grateful to you for your help

With the best Niloo

Quoting Giulio Formenti notifications@github.com:

Hi Nilo, the two files you need to give are simple text files with a
list of the absolute paths to your .bam files in the case of pacbio,
and fastq in the case of illumina. You cannot use fastq files for
pacbio as the pipeline will break at the arrow polishing step. Also
-2, is a single list in a file. Please check the exact requirements
of these two files (-1, -2) using mitoVGP -h

Best

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/VGP/vgp-assembly/issues/44#issuecomment-692149355

++++ running: blastMT ++++

Species: -s Taeniopygia_guttata

Species ID: -i bTaeGut2

Reference mitocontig: -r data_REF.fasta

Working directory: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates

Building a new DB, current time: 09/16/2020 10:59:44 New DB name: /gpfs1/work/alaeikak/mitoVGP/Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/blast/bTaeGut2.db New DB title: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/canu/bTaeGut2.contigs.fasta Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 1 sequences in 0.030797 seconds.

query_acc.ver subject_acc.ver %_identity alignment_length mismatches gap_opens q.start q.end s.start s.end evalue bitscore NC_040290.1 tig00000001 91.781 146 12 0 7726 7871 13667 13812 1.34e-53 204

score tig identity*coverage is_circular read_number 670000 tig00000001 13400 no 5 best candidate is probably contig: tig00000001

++++ running: linearizePhe ++++

Species: -s Taeniopygia_guttata

Species ID: -i bTaeGut2

Contig number: -n tig00000001

Number of threads: -t 24

Working directory: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates

--Annotating tRNAs:

Unable to open Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed2/bTaeGut2.tig00000001_polish2_10x1_trim1_10x2_trim2.fasta for reading. Aborting program.

++++ running: map10x1 ++++

Species: -s Taeniopygia_guttata

Species ID: -i bTaeGut2

Contig number: -n tig00000001

Number of threads: -t 24

Local short read data: -2 list_10xdata.txt

Working directory: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates

--Following PE files found: /work/alaeikak/mitoVGP/10x_fastqfiles/data_10x_R1.fastq /work/alaeikak/mitoVGP/10x_fastqfiles/data_10x_R2.fastq 0

--Align: /work/alaeikak/mitoVGP/10x_fastqfiles/data_10x_R1.fastq /work/alaeikak/mitoVGP/10x_fastqfiles/data_10x_R2.fastq

193019246 reads; of these: 193019246 (100.00%) were paired; of these: 193017531 (100.00%) aligned concordantly 0 times 1715 (0.00%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

193017531 pairs aligned concordantly 0 times; of these:
  407 (0.00%) aligned discordantly 1 time
----
193017124 pairs aligned 0 times concordantly or discordantly; of these:
  386034248 mates make up the pairs; of these:
    386033569 (100.00%) aligned 0 times
    679 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

0.00% overall alignment rate

Merging bam files... [M::bam2fq_mainloop] discarded 679 singletons [M::bam2fq_mainloop] processed 4923 reads

--Sort and index the alignment: [bam_sort_core] merging from 0 files and 24 in-memory blocks...

--Sorting and indexing completed.

--Variant calling and polishing:

Variant calling...

index file Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round2/bTaeGut2.tig00000001_polish2.fasta.fai not found, generating...

Polishing...

Lines total/split/realigned/skipped: 218/0/200/0

--Variant calling and polishing completed.

++++ running: map10x2 ++++

Species: -s Taeniopygia_guttata

Species ID: -i bTaeGut2

Contig number: -n tig00000001

Number of threads: -t 24

Working directory: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates

--Generate sorted alignment:

Align... Error: could not open Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1_trim1.fasta Error: Encountered internal Bowtie 2 exception (#1) Command: /gpfs0/home/alaeikak/anaconda3/envs/mitoVGP_pacbio/bin/bowtie2-build-s --wrapper basic-0 --threads 24 -q Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1_trim1.fasta Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/bowtie2_round2/bTaeGut2 (ERR): "Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/bowtie2_round2/bTaeGut2" does not exist or is not a Bowtie 2 index Exiting now ...

++++ running: mitoPolish ++++

Species: -s Taeniopygia_guttata

Species ID: -i bTaeGut2

Contig number: -n tig00000001

Number of threads: -t 24

Long read platform: -a pacbio

Pacbio variant caller: -b variantCaller

Working directory: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates

--First round of polishing:

33 reads were trimmed by Canu of which 5 were used in the assembly of contig tig00000001 INFO 2020-09-16 10:59:47 RevertSam

** NOTE: Picard's command line syntax is changing.

** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

** The command line looks like this in the new syntax:

** RevertSam -I Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/BAM1.bam -O Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/uBAM1.bam -MAX_DISCARD_FRACTION 0.005 -ATTRIBUTE_TO_CLEAR XT -ATTRIBUTE_TO_CLEAR XN -ATTRIBUTE_TO_CLEAR AS -ATTRIBUTE_TO_CLEAR OC -ATTRIBUTE_TO_CLEAR OP -SORT_ORDER unsorted -RESTORE_ORIGINAL_QUALITIES true -REMOVE_DUPLICATE_INFORMATION true -REMOVE_ALIGNMENT_INFORMATION true -VALIDATION_STRINGENCY STRICT

10:59:48.107 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs0/home/alaeikak/anaconda3/envs/mitoVGP_pacbio/share/picard-2.20.6-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Wed Sep 16 10:59:48 CEST 2020] RevertSam INPUT=Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/BAM1.bam OUTPUT=Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/uBAM1.bam SORT_ORDER=unsorted RESTORE_ORIGINAL_QUALITIES=true REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true ATTRIBUTE_TO_CLEAR=[NM, UQ, PG, MD, MQ, SA, MC, AS, XT, XN, AS, OC, OP] MAX_DISCARD_FRACTION=0.005 VALIDATION_STRINGENCY=STRICT OUTPUT_BY_READGROUP=false OUTPUT_BY_READGROUP_FILE_FORMAT=dynamic SANITIZE=false KEEP_FIRST_DUPLICATE=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Wed Sep 16 10:59:48 CEST 2020] Executing as alaeikak@frontend1 on Linux 3.10.0-1127.18.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.1-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.20.6-SNAPSHOT [Wed Sep 16 10:59:48 CEST 2020] picard.sam.RevertSam done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=536870912 INFO 2020-09-16 10:59:49 FilterSamReads

** NOTE: Picard's command line syntax is changing.

** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

** The command line looks like this in the new syntax:

** FilterSamReads -I Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/picard/bTaeGut2.realigned_raw_reads_rh.bam -O Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/picard/bTaeGut2.realigned_raw_reads_rh_tig00000001.bam -READ_LIST_FILE Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/bTaeGut2_tig00000001_names.txt -FILTER includeReadList -VALIDATION_STRINGENCY STRICT

10:59:49.372 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs0/home/alaeikak/anaconda3/envs/mitoVGP_pacbio/share/picard-2.20.6-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Wed Sep 16 10:59:49 CEST 2020] FilterSamReads INPUT=Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/picard/bTaeGut2.realigned_raw_reads_rh.bam FILTER=includeReadList READ_LIST_FILE=Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/bTaeGut2_tig00000001_names.txt OUTPUT=Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/picard/bTaeGut2.realigned_raw_reads_rh_tig00000001.bam VALIDATION_STRINGENCY=STRICT WRITE_READS_FILES=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Wed Sep 16 10:59:49 CEST 2020] Executing as alaeikak@frontend1 on Linux 3.10.0-1127.18.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.1-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.20.6-SNAPSHOT INFO 2020-09-16 10:59:49 FilterSamReads Filtering [presorted=true] bTaeGut2.realigned_raw_reads_rh.bam -> OUTPUT=bTaeGut2.realigned_raw_reads_rh_tig00000001.bam [sortorder=unsorted] INFO 2020-09-16 10:59:49 FilterSamReads 5 SAMRecords written to bTaeGut2.realigned_raw_reads_rh_tig00000001.bam [Wed Sep 16 10:59:49 CEST 2020] picard.sam.FilterSamReads done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=536870912 variantCaller Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/picard/bTaeGut2.realigned_raw_reads_rh_tig00000001_sorted.bam -r Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/bTaeGut2.tig00000001.fasta -o Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/bTaeGut2.tig00000001_polish.fasta --algorithm=arrow -j 24

--First round completed.

--Second round of polishing:

INFO 2020-09-16 10:59:51 RevertSam

** NOTE: Picard's command line syntax is changing.

** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

** The command line looks like this in the new syntax:

** RevertSam -I Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/picard/bTaeGut2.realigned_raw_reads_rh_tig00000001_sorted.bam -O Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/uBAM2.bam -MAX_DISCARD_FRACTION 0.005 -ATTRIBUTE_TO_CLEAR XT -ATTRIBUTE_TO_CLEAR XN -ATTRIBUTE_TO_CLEAR AS -ATTRIBUTE_TO_CLEAR OC -ATTRIBUTE_TO_CLEAR OP -SORT_ORDER unsorted -RESTORE_ORIGINAL_QUALITIES true -REMOVE_DUPLICATE_INFORMATION true -REMOVE_ALIGNMENT_INFORMATION true -VALIDATION_STRINGENCY STRICT

10:59:52.018 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gpfs0/home/alaeikak/anaconda3/envs/mitoVGP_pacbio/share/picard-2.20.6-0/picard.jar!/com/intel/gkl/native/libgkl_compression.so [Wed Sep 16 10:59:52 CEST 2020] RevertSam INPUT=Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/picard/bTaeGut2.realigned_raw_reads_rh_tig00000001_sorted.bam OUTPUT=Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/polish/polish_round1/uBAM2.bam SORT_ORDER=unsorted RESTORE_ORIGINAL_QUALITIES=true REMOVE_DUPLICATE_INFORMATION=true REMOVE_ALIGNMENT_INFORMATION=true ATTRIBUTE_TO_CLEAR=[NM, UQ, PG, MD, MQ, SA, MC, AS, XT, XN, AS, OC, OP] MAX_DISCARD_FRACTION=0.005 VALIDATION_STRINGENCY=STRICT OUTPUT_BY_READGROUP=false OUTPUT_BY_READGROUP_FILE_FORMAT=dynamic SANITIZE=false KEEP_FIRST_DUPLICATE=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Wed Sep 16 10:59:52 CEST 2020] Executing as alaeikak@frontend1 on Linux 3.10.0-1127.18.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.1-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.20.6-SNAPSHOT [Wed Sep 16 10:59:52 CEST 2020] picard.sam.RevertSam done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=536870912

--Second round completed.

++++ running: mtDNApipe ++++

Species: -s Taeniopygia_guttata

Species ID: -i bTaeGut2

Reference: -r data_REF.fasta

Genome size: -g 16812

Number of threads: -t 24

Long read platform: -a pacbio

Local long read data: -1 list_pacbiodata.txt

Working directory: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates

--Following long read files found: /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191029_092331.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191029_193656.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191030_055118.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191030_160534.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191031_021956.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191031_123442.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191102_020719.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191102_122601.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191102_224102.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191103_191000.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191104_052508.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191104_153949.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191105_135045.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191026_102927.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191026_204557.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191030_144218.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191031_005629.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191031_111041.subreads.bam

convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191029_092331.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191029_193656.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191031_123442.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191102_020719.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191102_224102.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191103_191000.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191104_052508.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191104_153949.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54032_191105_135045.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54259_191026_102927.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54259_191026_204557.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54259_191030_144218.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54259_191031_005629.subreads.bam to fastq convert: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_bam/aligned_m54259_191031_111041.subreads.bam to fastq

extracted 49 reads

canu-1.8/Linux-amd64/bin/canu -p bTaeGut2 -d Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/canu useGrid=false genomeSize=16812 -pacbio-raw Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_MT_extracted_reads/bTaeGut2.fastq.gz

++++ running: trimmer ++++

Species: -s Taeniopygia_guttata

Species ID: -i bTaeGut2

Contig number: -n tig00000001

Number of threads: -t 24

Working directory: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates

[bam_sort_core] merging from 0 files and 24 in-memory blocks... [M::bam2fq_mainloop] discarded 679 singletons [M::bam2fq_mainloop] processed 4923 reads 2122 reads; of these: 2122 (100.00%) were paired; of these: 409 (19.27%) aligned concordantly 0 times 1713 (80.73%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

409 pairs aligned concordantly 0 times; of these:
  126 (30.81%) aligned discordantly 1 time

86.66% overall alignment rate [bam_sort_core] merging from 0 files and 24 in-memory blocks...

1: PREPARING DATA 2,3: RUNNING mummer AND CREATING CLUSTERS

reading input file "Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref" of length 24497

construct suffix tree for sequence of length 24497

(maximum reference length is 536870908)

(maximum query length is 4294967295)

CONSTRUCTIONTIME /home/alaeikak/anaconda3/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref 0.00

reading input file "/gpfs1/work/alaeikak/mitoVGP/Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/intermediate.fasta" of length 24496

matching query-file "/gpfs1/work/alaeikak/mitoVGP/Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/intermediate.fasta"

against subject-file "Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref"

COMPLETETIME /home/alaeikak/anaconda3/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref 0.01

SPACE /home/alaeikak/anaconda3/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref 0.05

4: FINISHING DATA

++++ running: trimmer2 ++++

Species: -s Taeniopygia_guttata

Species ID: -i bTaeGut2

Contig number: -n tig00000001

Number of threads: -t 24

Working directory: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates

cat: Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/freebayes_round2/bTaeGut2.tig00000001_polish2_10x1_trim1_10x2.fasta: No such file or directory

/work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191029_092331.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191029_193656.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191030_055118.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191030_160534.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191031_021956.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191031_123442.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191102_020719.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191102_122601.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191102_224102.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191103_191000.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191104_052508.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191104_153949.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54032_191105_135045.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191026_102927.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191026_204557.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191030_144218.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191031_005629.subreads.bam /work/alaeikak/mitoVGP/pacbio_bamfiles/m54259_191031_111041.subreads.bam

gf777 commented 3 years ago

Hi Niloo, thanks for sharing this. So now it seems that the issue with loading the raw data is fixed. The canu assembly seems to be working, and so does the long read polishing. But I am not fully convinced that canu generated the actual mitocontig, and if it did it the log says it used only 5 reads (which might be enough as a backbone in some cases but definitely not extraordinary coverage). As mentioned in the paper pacbio libraries can be depleted in mitoreads. In the beginning you mentioned that you have length of reads: 53,852,369,634. Does it mean that your total sequencing depth is about 50x?

Can you share the bTaeGut2.contigs.fasta file that you find the the canu folder? Just copy paste the sequence if it is hard to upload.

As an additional control, have you ever tried to run the example in the readme https://github.com/VGP/vgp-assembly/tree/master/mitoVGP ? If not please try, just to rule out that there is any issue with some of the software.

Lastly, is there any relationship between bTaeGut2 as you are naming your sample and VGP bTaeGut2 (https://vgp.github.io/genomeark/Taeniopygia_guttata/)?

gf777 commented 3 years ago

Hello Niloo,

unfortunately I cannot see the fasta file attached. Normally you should drag and drop it in the chat. Please try to send it again.

Good that the pipeline finished correctly for the test case. We now only need to find the best parameters for your species. With regard to the -s and -i options, these were just examples from the VGP, but you can name them however it pleases you. Like using your own species name and an ID that works for you. They will be used to name files and folders in the output.

I am still confused about the coverage information you have shared. The typical length of pacbio reads is 10-20kb, definitely not 53852369634. I thought these were the total bp sequenced, therefore assuming a genome size of about 1Gbp, I estimated coverage being about 54x.

gf777 commented 3 years ago

test attachment

SRR7973880.meryl.hist.zip

gf777 commented 3 years ago

Hi Niloo, good that you have such high coverage. We should definitely be able to assemble the mito then. Unfortunately I still don't see the attachment. I carried out a quick test (see my message above). If it is as .fasta it has to be compressed .zip for github to accept it. Best, Giulio

gf777 commented 3 years ago

Hi Niloo, I think the problem is that you cannot reply to the email attaching the file. you need to go to the github page where this discussion is taking place (https://github.com/VGP/vgp-assembly/issues/44) and upload it from there. Best

Niloofar-Alaei commented 3 years ago

Hi Giulio, Please find the attachment file, I hope finally you can get it bTaeGut2.contigs.zip

gf777 commented 3 years ago

Hi Niloo,

got it now thanks. So the explanation is simple: there were no mtDNA reads that canu could use for assembly. If you read the paper, there are few explanations: 1) library prep related, depletion of mtDNA from the sample (bad, no chance to assemble it); 2) too distant/wrong reference (is it same species/individual/family/order)? If your coverage is 200x, unless the prep is really selective to nuclear DNA (some are), or the tissue is really poor in mtDNA (some tissues are, what tissue was used here?), I find it unlikely that that there are few/no mtDNA reads. Check / send me Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/tgs_MT_extracted_reads/bTaeGut2.fastq.gz to see if these reads seem to contain genuine mtDNA reads that somehow don't get assembled. If they do we can potentially filter them to get the assembly to proceed

Niloofar-Alaei commented 3 years ago

Hi dear Giulio

Thanks for your help, I see this file, bTaeGut2.fastq.gz, and also I send you too.

It contains 49 header, and I blast some of them and the results show that they belong to mitogenome

concerning the reference , I used the Oenanthe isabellina, and my data belongs to another species of the genus Oenanthe (O. melanoleuca).

bTaeGut2.fastq.zip

gf777 commented 3 years ago

Dear Niloo,

now the situation is getting clear. I have also blasted the reads and only a handful of them (8) are mitochondrial. This means the library is highly depleted in mtDNA. This situation is far from be uncommon nowadays because of the progressively more stringent size selection on CLR libraries. That said, that handful could be enough as backbone and then the short reads should allow to attain the desidered level of sequence accuracy. However what is happening is that canu is confounded by the high amount of spurious nDNA reads. You should be able to filter them out with adding the following options before running the tool:

-f 25000 -p 5

-f with remove all reads above 25k in length -p will remove all reads having read coverage <5% of the mito

Because there are so few mtDNA reads I would also add this option to prevent Canu to stop:

-o "stopOnLowCoverage=0 obtErrorRate=0.1 obtOvlErrorRate=0.1 utgErrorRate=0.08 utgOvlErrorRate=0.08"

Maybe try with and without.

Hope this helps. Best

--

Niloofar-Alaei commented 3 years ago

Dear Giulio

That's sounds good, then I run it by adding these to command.

Thanks for your help

Niloo

From: Giulio Formenti notifications@github.com Sent: 21 September 2020 15:25 To: VGP/vgp-assembly Cc: Niloofar Alaei Kakhki; Author Subject: Re: [VGP/vgp-assembly] time of running MITOVGP (#44)

Dear Niloo,

now the situation is getting clear. I have also blasted the reads and only a handful of them (8) are mitochondrial. This means the library is highly depleted in mtDNA. This situation is far from be uncommon nowadays because of the progressively more stringent size selection on CLR libraries. That said, that handful could be enough as backbone and then the short reads should allow to attain the desidered level of sequence accuracy. However what is happening is that canu is confounded by the high amount of spurious nDNA reads. You should be able to filter them out with adding the following options before running the tool:

-f 25000 -p 5

-f with remove all reads above 25k in length -p will remove all reads having read coverage <5% of the mito

Because there are so few mtDNA reads I would also add this option to prevent Canu to stop:

-o "stopOnLowCoverage=0 obtErrorRate=0.1 obtOvlErrorRate=0.1 utgErrorRate=0.08 utgOvlErrorRate=0.08"

Maybe try with and without.

Hope this helps. Best

--

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/VGP/vgp-assembly/issues/44#issuecomment-696112908, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANPOBGDIRTMCXKRVI6DPPHTSG5H4VANCNFSM4RHLVOYA.

Niloofar-Alaei commented 3 years ago

Hi dear Giulio

I used the same filtering steps as you said, but I think it was too harsh for my data, because I receive a BLAST options error: File Oenanthe_melanoleuca/Omelano/assembly_MT_rockefeller/intermediates/canu/Omelano.contigs.fasta is empty.

Then I was using just the –f (and overlook the –p), it was running. I checked Omelano.contigs.fasta, its included 3 contigs (>tig00000001 len=20322 reads=10 covStat=7.58, >tig00000003 len=17249 reads=3 covStat=3.05, >tig00000005 len=10807 reads=7 covStat=9.54), that at least the blast result shows one of these contigs (tig00000005) is a candidate of mitogenome. But I received this error message in the annotation step: --Annotating tRNAs: Unable to open Oenanthe_melanoleuca/Omelano/assembly_MT_rockefeller/intermediates/trimmed2/Omelano.tig00000005_polish2_10x1_trim1_10x2_trim2.fasta for reading. Aborting program.

There is no trimmed2 folder, it looked that the first round of short-read polishing worked good, but probably something wrong happened in the final round of polishing (round2) and then the identification of tRNAs was stopped.

I also attached the log files.

With the best Niloo log.zip

gf777 commented 3 years ago

Hi Niloo, I think in your second attempt it didn't work because none of the contigs from Canu was really representing the mitogenome. Can you share Omelano.contigs.fasta file? If so, we need to make sure we carefully select the reads and make sure Canu is able to assemble them. We are very close to the lower coverage boundary for a mito to be assembled. Also please share the Canu log from the attempt using both -f , -p and -o

Best

Niloofar-Alaei commented 3 years ago

Dear Giulio

I attached the Omelano.contigs.fasta file and also the log files when I used the all -f, -p and -o Omelano.contigs.zip log.zip

gf777 commented 3 years ago

Hi Niloo, as suspected the contigs generated without -p are nuclear DNA. They will be polished but of course cannot be circularized as they do no represent the mito. What happens if do not filter with -p is that the nuclear reads take over on the few mtDNA reads which get like collapsed into those, or could be unassembled (you can check for those reads in the unassembled.fasta in the Canu output). In the log files you just shared I miss the relevant Canu log. This is probably part of the main log when you run the tool. Can you share that too?

Niloofar-Alaei commented 3 years ago

canu-logs.zip

Dear Giulio, Ahannnn, but when I define the -p 5 (I also try the -p 3), the canu produce the empty contigs.fasta.

I my last email, I attached the files in the log folder, if I understand correctly, you need files in the canu-logs folder (Oenanthe_melanoleuca-f25000-p5/Omelano/assembly_MT_rockefeller/intermediates/canu/canu-logs), now, I attach them.

The best

gf777 commented 3 years ago

Dear Niloo, sorry for not making this clear earlier but that is the internal log of canu. What is relevant here is the main log of canu (called canu.out and also .report, both in the main folder of canu). Also, you should have the full log of running the mitoVGP pipeline (the stdout when you run it).

Niloofar-Alaei commented 3 years ago

Dear Giulio, its ok. Now I see the canu folder, just the .report file exist and I dont have canu.out. And I also I dont have the full log file as you mentioned (stdout). I see all the folders, but these files dont exist. I attached the .report file. Omelano.zip

Niloofar-Alaei commented 3 years ago

dear Giulio, I run mitoVGP pipeline by suing all the -f, -p and -o and then adding the 2> adn the end of command to print the all the information in the screen during running the pipeline, maybe it helps, I attached it. log.zip

gf777 commented 3 years ago

Dear Niloo, yes the problem is that the mito coverage of this dataset is extremely low. Can you check/send me what's in the unassembled.fasta of canu? This will likely require manual effort.

Niloofar-Alaei commented 3 years ago

Dear Giulio, Yes, because of that when I defined the coverage cutoff (-p), the canu produced the empty fasta file. I see the the unassembled.fasta. Its included these 6 contigs:

tig00000001 len=24290 reads=5 covStat=7.80 gappedBases=no class=unassm suggestRepeat=no suggestCircular=no tig00000002 len=1929 reads=1 covStat=0.00 gappedBases=no class=unassm suggestRepeat=no suggestCircular=no tig00000003 len=7624 reads=1 covStat=0.00 gappedBases=no class=unassm suggestRepeat=no suggestCircular=no tig00000004 len=7354 reads=1 covStat=0.00 gappedBases=no class=unassm suggestRepeat=no suggestCircular=no tig00000005 len=3180 reads=1 covStat=0.00 gappedBases=no class=unassm suggestRepeat=no suggestCircular=no tig00000006 len=10617 reads=1 covStat=0.00 gappedBases=no class=unassm suggestRepeat=no suggestCircular=no

and also you can find it as an attachment, Omelano.unassembled.zip

gf777 commented 3 years ago

Dear Niloo, the first contig is indeed your mito. Let's see if we can make this work without manual intervention. Try adding this to the canu options:

-o "stopOnLowCoverage=0 obtErrorRate=0.1 obtOvlErrorRate=0.1 utgErrorRate=0.08 utgOvlErrorRate=0.08 contigFilter=\"2 0 1.0 0.5 0\" "

Courtesy of @skoren (https://canu.readthedocs.io/en/latest/faq.html#my-asm-contigs-fasta-is-empty-why).

You will probably have to escape the quotes, as I showed there, though I am not 100% sure that is the appropriate syntax.

Niloofar-Alaei commented 3 years ago

Hi dear Giulio I run the mitoVGP by using the contigFilter.

I try different pattern with quotes and without it, but in all of them canu doesn’t run at all, and when I check the log file (Omelano_mtDNApipe_20200929-174427.out), It mention the Invalid command line option.

The best Niloo

From: Giulio Formenti notifications@github.com Sent: 24 September 2020 15:54 To: VGP/vgp-assembly Cc: Niloofar Alaei Kakhki; Author Subject: Re: [VGP/vgp-assembly] time of running MITOVGP (#44)

Dear Niloo, yes the problem is that the mito coverage of this dataset is extremely low. Can you check/send me what's in the unassembled.fasta of canu? This will likely require manual effort.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/VGP/vgp-assembly/issues/44#issuecomment-698359937, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANPOBGGYFUHX7VS4OYJSPTTSHNFSFANCNFSM4RHLVOYA.

gf777 commented 3 years ago

Hi Niloo, I have just tested the suggested syntax on the VGP example and it did not cause me any trouble with canu and completed normally:

srun --partition=hpc --cpus-per-task=24 sh ./mitoVGP -a pacbio -s Mastacembelus_armatus -i fMasArm1 -r mtDNA_Mastacembelus_armatus.fasta -t 24 -b variantCaller -o "stopOnLowCoverage=0 obtErrorRate=0.1 obtOvlErrorRate=0.1 utgErrorRate=0.08 utgOvlErrorRate=0.08 contigFilter=\"2 0 1.0 0.5 0\" "

Here is what I see in the parameter summary:

Species: -s Mastacembelus_armatus

Species ID: -i fMasArm1

Reference: -r mtDNA_Mastacembelus_armatus.fasta

Genome size: -g 16486

Number of threads: -t 24

Long read platform: -a pacbio

Canu options: -o stopOnLowCoverage=0 obtErrorRate=0.1 obtOvlErrorRate=0.1 utgErrorRate=0.08 utgOvlErrorRate=0.08 contigFilter="2 0 1.0 0.5 0"

Working directory: Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates

Collecting data using: aws s3 ls s3://genomeark/species/Mastacembelus_armatus/fMasArm1/genomic_data/pacbio/

How parameters are passed to the shell however may vary. You can maybe try my command above first and then see if you can adapt it to your case. Alternatively, you can run all the steps of the pipeline separately once you have the canu output, but that would be slightly more complicated. Each script (under scripts) has it's own help.

Let me know how it goes.

Giulio

Niloofar-Alaei commented 3 years ago

Hi dear Giulio

Sorry for lots of delay to answer you, something wrong happened for our server and I didn’t access it for around two weeks.

Today, I run this command: ./mitoVGP -a pacbio -s Oenanthe_melanoleuca -i Omelano -r mtDNA_Oenanthe_isabellina.fasta -t 24 -1 data_pacbio.txt -2 data_10x.txt -f 25000 -p 5 -b variantCaller -o "stopOnLowCoverage=0 obtErrorRate=0.1 obtOvlErrorRate=0.1 utgErrorRate=0.08 utgOvlErrorRate=0.08 contigFilter=\"2 0 1.0 0.5 0\" " and fortunately its running and finish without any error message and give me one file as a mitogenome of my species. The length of that is 18631bp.

I think it should be my MT genome, am I right?

Thanks a lot for all your help Niloo

From: Giulio Formenti notifications@github.com Sent: 01 October 2020 05:16 To: VGP/vgp-assembly Cc: Niloofar Alaei Kakhki; Author Subject: Re: [VGP/vgp-assembly] time of running MITOVGP (#44)

Hi Niloo, I have just tested the suggested syntax on the VGP example and it did not cause me any trouble with canu and completed normally:

srun --partition=hpc --cpus-per-task=24 sh ./mitoVGP -a pacbio -s Mastacembelus_armatus -i fMasArm1 -r mtDNA_Mastacembelus_armatus.fasta -t 24 -b variantCaller -o "stopOnLowCoverage=0 obtErrorRate=0.1 obtOvlErrorRate=0.1 utgErrorRate=0.08 utgOvlErrorRate=0.08 contigFilter=\"2 0 1.0 0.5 0\" "

Here is what I see in the parameter summary:

`Species: -s Mastacembelus_armatus

Species ID: -i fMasArm1

Reference: -r mtDNA_Mastacembelus_armatus.fasta

Genome size: -g 16486

Number of threads: -t 24

Long read platform: -a pacbio

Canu options: -o stopOnLowCoverage=0 obtErrorRate=0.1 obtOvlErrorRate=0.1 utgErrorRate=0.08 utgOvlErrorRate=0.08 contigFilter="2 0 1.0 0.5 0"

Working directory: Mastacembelus_armatus/fMasArm1/assembly_MT_rockefeller/intermediates

Collecting data using: aws s3 ls s3://genomeark/species/Mastacembelus_armatus/fMasArm1/genomic_data/pacbio/`

How parameters are passed to the shell however may vary. You can maybe try my command above first and then see if you can adapt it to your case. Alternatively, you can run all the steps of the pipeline separately once you have the canu output, but that would be slightly more complicated. Each script (under scripts) has it's own help.

Let me know how it goes.

Giulio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/VGP/vgp-assembly/issues/44#issuecomment-701825932, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANPOBGGEZRI7SHW7BP6HXNLSIPYB7ANCNFSM4RHLVOYA.

gf777 commented 3 years ago

Dear Niloo,

sounds like like it :-)

As initial sanity check you could 1) BLAST it to see if divergence is consistent with the proposed phylogeny. 2) Annotate it using Mitos2 http://mitos2.bioinf.uni-leipzig.de/index.py . If the assembly is good you should see no critical alerts.

I am glad it worked out!

All the best,

Giulio

Niloofar-Alaei commented 3 years ago

Dear Giulio,

I am happy to be able to assemble the mitogenome 😊

As you said, I also annotate it using the MITOZ and it also works 😊 and just give me these warning:

Features not found: OL
Split/duplicated features: trnP, nad6, trnF, OH
Translational exceptions: start=nad6, stop=nad6,
Overlaps:(atp8,atp6):10; (cox1,trnS2):9; (nad4l,nad4):7; (trnF,rrnS):1; (rrnS,trnV):1; (trnQ,trnM):1; (nad2,trnW):1; (trnC,trnY):1; (trnS1,trnL1):1;

Thanks alot and with the best

Niloo

From: Giulio Formenti notifications@github.com Sent: 12 October 2020 15:28 To: VGP/vgp-assembly Cc: Niloofar Alaei Kakhki; Author Subject: Re: [VGP/vgp-assembly] time of running MITOVGP (#44)

Dear Niloo,

sounds like like it :-)

As initial sanity check you could 1) BLAST it to see if divergence is consistent with the proposed phylogeny. 2) Annotate it using Mitos2 http://mitos2.bioinf.uni-leipzig.de/index.py . If the assembly is good you should see no critical alerts.

I am glad it worked out!

All the best,

Giulio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/VGP/vgp-assembly/issues/44#issuecomment-707119963, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANPOBGHF7JVDAKWLTUV5VQLSKL76TANCNFSM4RHLVOYA.

gf777 commented 3 years ago

Hi Niloo,

this looks good to me, assuming that there is a gene gene duplication in the control region, which is very common in birds (as we detailed here: https://www.biorxiv.org/content/10.1101/2020.06.30.177956v2). I also assume the translational exception in nad6 is indeed affecting the duplicated nad6 in the CR.

Good luck with your analyses!

Best,

Giulio

Niloofar-Alaei commented 3 years ago

Thanks a lot for all your help

The best

Niloo

From: Giulio Formenti notifications@github.com Sent: 12 October 2020 18:52 To: VGP/vgp-assembly Cc: Niloofar Alaei Kakhki; Author Subject: Re: [VGP/vgp-assembly] time of running MITOVGP (#44)

Hi Niloo,

this looks good to me, assuming that there is a gene gene duplication in the control region, which is very common in birds (as we detailed here: https://www.biorxiv.org/content/10.1101/2020.06.30.177956v2). I also assume the translational exception in nad6 is indeed affecting the duplicated nad6 in the CR.

Good luck with your analyses!

Best,

Giulio

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/VGP/vgp-assembly/issues/44#issuecomment-707233339, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANPOBGC4UYX54PEUIXC3POLSKMX4TANCNFSM4RHLVOYA.

VGP / vgp-assembly

time of running MITOVGP #44

193019246 reads; of these: 193019246 (100.00%) were paired; of these: 193017531 (100.00%) aligned concordantly 0 times 1715 (0.00%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly >1 times

reading input file "Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref" of length 24497

construct suffix tree for sequence of length 24497

(maximum reference length is 536870908)

(maximum query length is 4294967295)

CONSTRUCTIONTIME /home/alaeikak/anaconda3/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref 0.00

reading input file "/gpfs1/work/alaeikak/mitoVGP/Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/intermediate.fasta" of length 24496

matching query-file "/gpfs1/work/alaeikak/mitoVGP/Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/intermediate.fasta"

against subject-file "Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref"

COMPLETETIME /home/alaeikak/anaconda3/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref 0.01

SPACE /home/alaeikak/anaconda3/envs/mitoVGP_pacbio/opt/mummer-3.23/mummer Taeniopygia_guttata/bTaeGut2/assembly_MT_rockefeller/intermediates/trimmed/bTaeGut2.tig00000001_polish2_10x1.ntref 0.05