aseyboldt / fastq-rs

MIT License
47 stars 12 forks source link

each_zipped and parse_path #7

Open wwood opened 5 years ago

wwood commented 5 years ago

Hi,

fastq looks good - but I'm wondering if there's a way to iterate multiple fastqs together when each of them may or may not be compressed? My current attempt basically involves munging the source of each of these methods together.

Also, while I'm talking feature requests, any chance of fasta support? With that it would be a nice kseq.h replacement.

Thanks, ben

kisakesenhi commented 2 years ago

Hi,

fastq looks good - but I'm wondering if there's a way to iterate multiple fastqs together when each of them may or may not be compressed? My current attempt basically involves munging the source of each of these methods together.

Also, while I'm talking feature requests, any chance of fasta support? With that it would be a nice kseq.h replacement.

Thanks, ben

Hi Ben,

I've tried to this with flate2 package for fastq files, however there, the parsing stopped for concatenated gzip files as soon as it sees the file ending.

I've used the use rust_htslib::bgzf::Reader that doesn't have this, it also provides a read buffer independent of the whether the file is gzipped or plain text files. https://github.com/kisakesenhi/mgi_read_converter

Best,

Ibrahim

natir commented 2 years ago

Hi @wwood and @kisakesenhi

@kisakesenhi maybe you could use niffler to read transparently row, gz, bz2, xz. Maybe with a feature I could write a PR if you want.

Best

kisakesenhi commented 2 years ago

@natir, thank you very much, it seems interesting. Let's discuss this privately. I guess my problem with persist with concatanated files if it'll direct the reader from flate2 library.

We first need this https://github.com/rust-lang/flate2-rs/issues/315 to be fixed. niffler seems to be a better option, than using the htslib reader.

natir commented 2 years ago

Use MultiGzDecoder isn't solve your issue ?

kisakesenhi commented 2 years ago

Use MultiGzDecoder isn't solve your issue ?

That solves the issue thanks. Checked the niffler, it also uses MultiGzDecoder. Would you like to implement it with PR ?

Ideally, I would like it to be a base code that edits fastq files, trimming etc. So that'll great to have the input and the output of the same type, compression etc.

natir commented 2 years ago

I start work on it :)