Closed vsoch closed 7 years ago
oh wow that was much faster! I changed it to:
grep -Eo '@channel_[0-9]+' $all_reads > $output_folder/06_poisson_calculation/01_reads_channels.lst
and now the file has a very long list that looks like this:
@channel_180
@channel_180
@channel_182
@channel_182
@channel_182
@channel_182
@channel_182
@channel_182
@channel_182
@channel_182
@channel_182
[truncated]
I'll hold off on trying out the next steps before we chat, it's likely that this command needs to be tweaked, and (I think you are on the East Coast?) and (hopefully) already sleeping. Chat later!
Hi @vsoch, the current grep command is definitely not optimal. For this particular data set, it took my iMac (Late 2014, 4Ghz i7, 32 GB, w/ Solid State Drive) about 2 minutes.
However, from my experience, I have had the most consistent success (with various MinKnow, Albacore, and Poretools versions) by either searching for the channel in the .fast5 file path _ch[0-9]+_
when converting basecalled fast5 files to fastq or for ch=[0-9]+
if the fastq was generated from MinKnow during live basecalling or from Albacore with -o fast5,fastq
enabled.
There is something similar here and here before the "good read vs. bad read" sorting step.
@vsoch I forgot to mention that the header @header_abc_etc
will not always include the channel information @channel_123_header_abc_etc
depending on the software or software version.
ah this makes sense, I didn't know the headers varied. I think the original is good then, thanks for the help.
And a heads up I'm going to do a small PR later to adjust some of the environment variables (just found a small typo for a path)
Hey @amojarro ! I am looking at the
$all_reads
provided (thanks again!) and it looks like a read has a header like this:and I am thinking we may need to update the grep line to find them?
Specifically it would be this step. Would a more general grep for
@channel
work? I'm not trained in variant calling so I wanted to have a discussion vs. coming up with something on my own. How long does this step usually take?