GATB / DiscoSnp

DiscoSnp is designed for discovering all kinds of SNPs (not only isolated ones), as well as insertions and deletions, from raw set(s) of reads.
https://gatb.inria.fr/software/discosnp/
GNU Affero General Public License v3.0
38 stars 20 forks source link

EXCEPTION: cannot open ./trashme_3684_dsk_partitions_gatb/parts.360 #31

Closed githubgig closed 2 years ago

githubgig commented 2 years ago

Hello,

I previously ran DiscoSnp (reference free) with c 3, k31, b0, no low complexity successfully, but the genotyping rate is very low 0.036 after filtering. I tried changing to k51 and b1 to try to improve that, but I'm getting the below error. Why is the genotyping rate so low in the first run and what can I do to improve it? How can I try different parameters without getting this error? I can see the folder _trashme_3684_dsk_partitionsgatb was created, but the run has crashed.

[DSK: Pass 1/2, Step 1: partitioning ] 25 % elapsed: 207 min 7 sec remaining: 621 min 22 sec cpu: 686.9 % mem: [2744, 2744, 2744] MB Error: can't create output directory (./trashme_3684_dsk_partitions_gatb/) debug, doesexist:0created directory ./trashme_3684_dsk_partitions_gatb/

EXCEPTION: cannot open ./trashme_3684_dsk_partitions_gatb/parts.360 Too many open files there was a problem with graph construction$ reset

Thanks.

pierrepeterlongo commented 2 years ago

Hi

Pierre

githubgig commented 2 years ago

Hi Pierre,

Its the proportion of genotypes present per marker (here only 0.036 or 3.6% are present), which basically means 0.964 (or 96.4%) of data is missing. The total would probably be the average/mean across all markers.

pierrepeterlongo commented 2 years ago

So I understand you expect to find a set of markers but you're finding only a tiny portion of them, right? In this case, you may try less stringent parameters

pierrepeterlongo commented 2 years ago

I close this issue - please reopen it if your question was not answered in May.