Closed ke-shi closed 6 years ago
Hi ke-shi,
Thanks for pointing this out, it appears there is a bug here. The dataset
tool recognizes the qname.list file, but the underlying code for parsing bams does not.
The workaround at the moment if you want to create a dataset that properly filters based on a file of qnames is to use the keyword "qname_file" in your filter command.
dataset filter in.dataset.xml filtered.dataset.xml 'qname_file=/path/to/qname.list'
This will generate some errors ("unrecognized filter") and the dataset summarize
tool will not properly count filtered records, but underlying tools will properly use the qname file for whitelisting.
Incidently, you can accomplish these two steps (dataset filter
&& dataset consolidate
) in a single step using the tool bamsieve
which takes pacbio bam/xml as input and subsets the data according to white-/blacklists or randomly. The tool should be available in your smrtanalysis v5.1 command path, bamsieve --help
Thanks jrharting,
"qname_file" is working!!
bamsieve
is also tested. It is easy to use but subsets the data based on ZMW not subread names...
We tried to filter reads from a bam file using the following methods.
A file "textFile1.list" listing
dataset filter in.dataset.xml filtered.dataset.xml 'qname=textFile1.list'
The resultant filtered.dataset.xml contained:
dataset consolidate filtered.dataset.xml consolidate.subreads.bam out_fn.subreadset.xml
as described in the pages 21 to 22 of SMRT Tools Reference Guide.However, we got the error:
We know "textFile1.list" is not subread names but the file name listing QNAMEs... Could you let me know what's wrong in our code?