no filenames list built

Qvdauwer commented 1 year ago

Dear James,

I am trying to demultiplex multiread fast5 files to be able to use the fast5 files per barcode. This tool looks perfect for this application, unfortunately I am unable to get it to work.

I tried running the following command:

python /opt/SquiggleKit/fast5_fetcher_multi.py -v
-q /home/qvdauwer/PhD/WP1/circulomics/Guppy6/reads/Circulomics_barcode5.fastq.gz 
-s /home/qvdauwer/PhD/WP1/circulomics/Guppy6/sequencing_summary.txt.gz -m /home/qvdauwer/Fast5 -o ./fast5_bc5_circulomics

And even though it seems to pass the initial checks, there seems to be an error when it tries to get fast5 file names using seq_sum.

I added the verbose output:

Verbose mode active - dumping info to stderr
SquiggleKit fast5_fetcher: 1.3.0
args: Namespace(OSystem='Linux', f5_format='multi', fastq='/home/qvdauwer/PhD/WP1/circulomics/Guppy6/reads/Circulomics_barcode7.fastq', flat=None, index=None, multi_f5='/home/qvdauwer/Fast5', output='./fast5_bc7_circulomics', paf=None, pppp=False, prefix='trimmed', seq_sum='/home/qvdauwer/PhD/WP1/circulomics/Guppy6/sequencing_summary.txt', seq_sum_1D2=None, threshold=4000, trim=False, trim_list=None, verbose=True, version=False)
Multi-fast5 mode detected in mode: multi
Output folder './fast5_bc7_circulomics' created
Checks passed!
Starting things up!
Getting multi-fast5 info...
no filenames list built, check inputs
Extracting reads from multi-fast5 files...
No file paths built
done!

I looked at the other issues in this directory and my problem seems to be similar to the one in #55. So I guess there might be a naming problem somewhere but I did not manage to find what the exact issue is.

Here are the first few lines of the sequencing summary file if that helps sequencing_summary_first_lines.txt

Thank you in advance for your help, Quentin

Psy-Fer commented 1 year ago

Hello Quentin,

I'll have a look into this. Is the goal here to split your fast5 files into your barcode groups?

James

Psy-Fer commented 1 year ago

Ahh i think i found the problem.

So first, they changed the header in the sequencing summary file from filename_fast5 read_id to filename read_id`

So the column detection code doesn't work.

Second, i made an error in the matching code.

Let me fix both of these.

Psy-Fer commented 1 year ago

Okay, if you can do a git pull in the repo, and try again?

Potential fix made https://github.com/Psy-Fer/SquiggleKit/commit/adbc52e5f1fa06b9b7c7f4a9b71f48c9fba132bb

James

Psy-Fer commented 1 year ago

Also, I should probably mention, that another way to do this is to convert the files to slow5 using slow5tools, and then it is quite simple to extract the readIDs from the fastq file for each barcode, and use them to extract the slow5 records into their own slow5 file. Then if you wish to stay in fast5, you can convert back again.

slow5tools has a lot more mature checks and up to date handling of fast5 files and nanopore data than squigglekit (and the new version of squigglekit i'm working on will primarily be built upon slow5. So just another option if this tool doesn't provide all the features you are looking for.

James

Qvdauwer commented 1 year ago

Hey James,

My goal is indeed to split my fast5 files by barcode. I tried the updated version and it seems to be working fine.

Also, I will take a look at the slow5tools you recommended as an alternate option.

Thank you for your help and the additional advice.

Quentin

Psy-Fer / SquiggleKit

no filenames list built #59