Magdoll / Cogent

Coding Genome Reconstruction using Iso-Seq data
BSD 3-Clause Clear License
60 stars 17 forks source link

"segmentation error" running reconstruct_contig.py - version 6.1.0 #85

Open jcleple33 opened 3 years ago

jcleple33 commented 3 years ago

Hi Liz,

For about 60 clusters I have an error running reconstruct_contig.py:

example error report:

$ cat slurm-19441671.out 2.17-r941 /var/spool/slurm/d/job19441671/slurm_script : ligne 10 : 19632 Erreur de segmentation reconstruct_contig.py --nx_cycle_detection -k 300 ccs_APMX/1567_11137

I have tried k from 40 to 300, I have increase memory up to 20G, CPU up to 15 but I still have this segmentation error for these last 60 clusters. Have you an idea of what happened?

I attached the cluster file ccs_APMX/1567_11137 as example

and my script: @@@@@@@@@@

!/bin/bash

SBATCH -p workq

SBATCH -t 00-04:00:00

SBATCH --mem=20G

SBATCH --cpus-per-task=20

module load system/Miniconda3-4.4.10 module load bioinfo/Cogent-6.1.0

reconstruct_contig.py --nx_cycle_detection -k 300 ccs_APMX/1567_11137 @@@@@@@@@@

Many thanks in advance for your help

ccs_APMX:1567_11137.zip

jcleple33 commented 3 years ago

I have tried with version 6.0.0 and the reconstruction completed with reconstruct_contig.py --nx_cycle_detection -k 60 ccs_APMX/1567_11137 ( mem 10G and 10 cpu). So, may be a bug has been introduced in 6.1.0 compared to 6.0.0?

Magdoll commented 3 years ago

Hi @jcleple33 , similar to issue 83, I am also able to complete Cogent on this with reconstruct_contigs.py . see: https://www.dropbox.com/s/kazc3yqxhohc1xf/ccs_APMX.tar.gz?dl=0

jcleple33 commented 3 years ago

Hi Liz,

Indeed, increasing k enable to reconstruct most of the clusters. However for a few of them I am not sure to succeed. Below I indicate k values that are currently tested, without success, and I really don’t know if it is realistic to set so high values for k... should I stop and forgot these clusters? please, have you any idea to help me? many thanks in advance Jean-Charles

for ccs_APMX/55816_70 - 560 transcripts jleple@genologin2 /work/jleple/PACBIOII/E656_SMRTCell_A_X_P_M_merged/COGENT_APMX $ sarray -J JCrerun --cpus-per-task=10 --mem=10G rerun_55816_70 … Submitted batch job 19520073 give k 500 failed Submitted batch job 19557229 give k 1000 failed Submitted batch job 19557238 give k 1500 failed Submitted batch job 19557259 give k 2000 failed Submitted batch job 19557262 give k 2500 failed Submitted batch job 19557268 give k 3000 failed Submitted batch job 19557348 give k 4000 failed Submitted batch job 19557364 give k 5000 failed Submitted batch job 19557439 give k 6000 failed Submitted batch job 19557541 failed, I stop here STOP : "Cogent.splice_cycle - INFO - K-mer in-seq cycle detection: None found at k=6000"

for ccs_APMX/89607_63 - 14083 transcripts jleple@genologin2 /work/jleple/PACBIOII/E656_SMRTCell_A_X_P_M_merged/COGENT_APMX $ sarray -J JCrerun --cpus-per-task=10 --mem=10G rerun_89607_63 … Submitted batch job 19520088 give k 300 failed Submitted batch job 19557211 give k 400 failed Submitted batch job 19557261 give k 500 failed Submitted batch job 19557443 give k 600 failed Submitted batch job 19558302 give k 700 failed Submitted batch job 19558766 give k 800 failed Submitted batch job 19560385 give k 900 failed Submitted batch job 19560403 give k 1000 failed Submitted batch job 19569875 running

for ccs_APMX/1567_24 - 9267 transcripts jleple@genologin2 /work/jleple/PACBIOII/E656_SMRTCell_A_X_P_M_merged/COGENT_APMX $ sarray -J JCrerun --cpus-per-task=10 --mem=10G rerun_1567_24 … Submitted batch job 19520099 give k 2000 failed Submitted batch job 19569877

for ccs_APMX/8512_1009 - 2482 transcripts jleple@genologin2 /work/jleple/PACBIOII/E656_SMRTCell_A_X_P_M_merged/COGENT_APMX $ sarray -J JCrerun --cpus-per-task=10 --mem=10G rerun_8512_1009 … Submitted batch job 19520106 give k 300 failed Submitted batch job 19557212 give k 400 failed Submitted batch job 19557441 give k 500 failed Submitted batch job 19558300 give k 600 failed Submitted batch job 19560388 give k 700 failed Submitted batch job 19569878 running

for ccs_APMX/33030_715 - 3367 transcripts jleple@genologin2 /work/jleple/PACBIOII/E656_SMRTCell_A_X_P_M_merged/COGENT_APMX $ sarray -J JCrerun --cpus-per-task=10 --mem=10G rerun_33030_715 … Submitted batch job 19520115 give k 300 failed Submitted batch job 19557215 give k 400 failed Submitted batch job 19557225 give k 500 failed Submitted batch job 195572415 give k 600 failed Submitted batch job 19560397 give k 700 failed Submitted batch job 19571374 running

for ccs_APMX/10040_489 - 8311 transcripts jleple@genologin1 /work/jleple/PACBIOII/E656_SMRTCell_A_X_P_M_merged/COGENT_APMX $ sarray -J JCrerun --cpus-per-task=10 --mem=10G rerun_10040_489 … Submitted batch job 19557216 give k 1500 failed Submitted batch job 19571945 running

Le 12 sept. 2020 à 05:21, Elizabeth Tseng notifications@github.com a écrit :

Hi @jcleple33 , similar to issue 83, I am also able to complete Cogent on this with reconstruct_contigs.py . see: https://www.dropbox.com/s/kazc3yqxhohc1xf/ccs_APMX.tar.gz?dl=0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Magdoll commented 3 years ago

Hi @jcleple33 ,

Can you take a look at the input sequence for these particular failed Cogent families? My guess is these are highly repetitive sequences, in which case it is very challenging for Cogent to work.

If you want me to take a look at a few examples, let me know.

-Liz

jcleple33 commented 3 years ago

Hi Liz I checked the transcripts and your are right, homopolymer seems important. in cluster 55816_70 I found 50% of sequences carrying LTR so may be TEs... But what can be done at this step? I send you the clusters (in different post, because exceed 10Mo) 55816_70.zip 33030_715.zip

clusters, if you could consult some of them, it would be very helpful to me.

JC

jcleple33 commented 3 years ago

other cluster 89607_63.zip

jcleple33 commented 3 years ago

other cluster 10040_489.zip

jcleple33 commented 3 years ago

and the last one 1567_24.zip