PombertLab / SYNY

The SYNY pipeline investigates synteny between species by reconstructing protein clusters from gene pairs.
MIT License
29 stars 4 forks source link

Issue ussing --ranges #6

Closed sa-andre closed 1 week ago

sa-andre commented 1 week ago

Hello, I am trying once again the analysis using ranges and it isn't working.

I installed syny via conda and tried the example file, which completed correctly. When using my files with --ranges, it didnt work. I tried the same files and scaffolds but using --include (only with scaffolds names) and it worked, so it seems to be something going on with the ranges I am using. I doubled checked the ranges and couldn't find any error. I tried both with mashmap or minimap, but it runs completely but fails to generate any alignment (as far as I understand, it seems to be an error in the paf alignment. In attachment i am sending the error files of both mashmap and minimap. The genomes I am using are those you suggested that contained annotations. In one of the error files it suggested being killed by linus out of memory killer, however i did run using --include with no OOMK errors, which is supposedly more demanding than --ranges would be, so I am not sure the memory is the actual problem.

wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/904/425/465/GCF_904425465.1_Colossoma_macropomum/GCF_904425465.1_Colossoma_macropomum_genomic.gbff.gz wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/015/220/715/GCF_015220715.1_fPygNat1.pri/GCF_015220715.1_fPygNat1.pri_genomic.gbff.gz

ranges1.txt minimapsyny.log minimaperror.log ttachments/files/16086040/ranges1.txt) mashmapsyny.log mashmaperror.log

Pombert-JF commented 1 week ago

The issue stems from list_maker.pl. Working on a fix (code was missing an if (exists $ranges{$contig}){ } condition.

The putative fix works fine on the mashmap alignments. Running the diamond searches is surprisingly slow however. Will likely have to run it overnight to test it properly. Might take a day or two to get a clean fix.

Pombert-JF commented 1 week ago

list_maker.pl is now fixed and works properly with subranges. Also had to fix an issue with isoforms that resulted in concatenated strings and abnormally long runtimes (it was messing up DIAMOND homology searches). The new version has been pushed to GitHub.

Running the new version on your data with run_syny.pl -a *.gbff.gz --ranges ranges1.txt --aligner mashmap --out SUBRANGESmash -g 0 1 5 resulted in:

GCF_015220715_vs_GCF_904425465 gap_5 1e5 19 2x10 8 blue GCF_015220715_vs_GCF_904425465 mmap 1e5 19 2x10 8 blue

GCF_015220715_vs_GCF_904425465 gap_5 barplot 19 2x10 8 Spectral GCF_015220715_vs_GCF_904425465 mmap barplot 19 2x10 8 Spectral

sa-andre commented 1 week ago

I tried using minimap (default) and without --gaps and it worked alright. Thanks again!

why it is now generating two dotplot graphs, one that says minimap and the other that says gap?

Pombert-JF commented 1 week ago

The .mmap files are the plots for the minimap2/mashmap3 genome alignments. The .gap files are the plots generated from the gene cluster inferences.

Will close this issue as resolved.