algbio / ggcat

Compacted and colored de Bruijn graph construction and querying
MIT License
72 stars 10 forks source link

Issues with -d and -c flags (ggcat doesn't find sample in large color_mapping.in file) #47

Closed CamilaDuitama closed 3 months ago

CamilaDuitama commented 4 months ago

Hi.

I am currently running ggcat with a color_mapping.in file of 360 samples (both single and paired-end reads). The command I am running is as follows:

ggcat build -k 31 -c -d color_mapping.in -j 20 -m 600

I get the output you see on the ggcat.log file. Although ggcat finds adds the index with color per sample without problem, eventually, it throws an error as if a file was not found:

Panic: panicked at crates/io/src/sequences_stream/fasta.rs:15:14: Error while opening file /pasteur/appa/scratch/cduitama/RascovanProject/fastq_files/aOralNonHuman/SRR6877286_1.fastq.gz : Os { code: 2, kind: NotFound, message: "No such file or directory" } Backtrace: 0: <unknown> 1: <unknown> 2: <unknown> 3: <unknown> 4: <unknown> 5: <unknown> 6: <unknown> 7: <unknown> 8: <unknown> 9: <unknown> 10: <unknown> 11: <unknown> 12: <unknown> 13: <unknown> 14: <unknown> 15: <unknown> 16: <unknown> 17: __libc_start_main 18: <unknown> As an additional test, I take the sample that it's said to be not found and I create a smaller color_mapping.in file with it. Using the same parameters, ggcat runs without problem. This means for some reason ggcat finds the sample in a small color_mapping.in(2 samples) file but not on the large one (360 samples)

Thanks!

Camila ggcat.log

CamilaDuitama commented 3 months ago

There was an error in the filepath as there were some hidden '\s' characters at the end of each line that ggcat was interpreting as parth of the file path.