egonozer / in_silico_pcr

Perl script for simulating PCR reactions. Extract sequences from a query based on primer sequences.
GNU General Public License v3.0
37 stars 12 forks source link

new version supporting fastq as input and output #1

Closed splaisan closed 5 years ago

splaisan commented 5 years ago

HI, Would you by chance have already an adapted version that would accept read data (fastq) and keep the clipped qualities as well in the output (fastq) That would be extremely cool to extract amplicons from sequenced reads I am looking into codong this from your code but the 4 line input parsing needs some changes to work Thanks for any help in that

egonozer commented 5 years ago

Great idea, but I think that would end up being a different program, not necessarily something to incorporate into this one. I worry a bit that the regular-expression-based approach would be a lot slower when parsing read files, but it would be hard to know without trying.

Parsing fastq files 4 lines at a time isn't too tough in Perl. You just read in 4 lines at a time:

while (<$in>){  
    chomp(my $id = $_);  
    chomp(my $sequence = <$in>);  
    chomp(my $spacer = <$in>);  
    chomp(my $quality = <$in>);
    # etc...  
}

I don't have the time to write a whole new program for this right now, but feel free to adapt as much as you please from my code (which is itself an adaptation of Joseba Bikandi's php code).

Thanks.