Closed nh3 closed 6 years ago
Hi, I'm not sure if such a general approach is really useful. Do you have any other use cases in mind? This one is already covered by poorly documented chimeric
flag (https://github.com/lomereiter/sambamba/issues/185)
There will be plenty of use cases if simple arithmetics are allowed at the same time, e.g. '[NM] < sequence_length/30'
, but I guess it might be too involved to implement. The chimeric
flag is really useful. Thanks!
Alright, I've looked into Duktape today, integrating a JS interpreter is going to be much easier than modifying the current parser.
Minimal working example looks simple enough:
import bio.bam.reader;
import duktape; // extern(C) declarations from duktype.h
import std.stdio;
extern(C) duk_ret_t
read_sequence_length(duk_context ctx) {
auto read = cast(BamRead*)duk_get_pointer(ctx, 0);
duk_push_int(ctx, read.sequence_length);
return 1;
}
void main(string[] args) {
auto read = new BamReader(args[1]).reads.front;
duk_context ctx = duk_create_heap_default();
scope(exit) duk_destroy_heap(ctx);
// register the function globally
duk_push_c_function(ctx, &read_sequence_length, 1);
duk_put_global_string(ctx, "sequenceLength");
duk_eval_string(ctx, "(function(r) {
return sequenceLength(r) > 40;
})"); // stack: [func]
duk_push_pointer(ctx, &read); // stack: [func bamread]
duk_call(ctx, 1); // stack: [result]
duk_bool_t result = duk_get_boolean(ctx, -1);
duk_pop(ctx);
writeln("filter result:", to!bool(result));
}
It's amazing that you come up with a solution so quickly! Speed-wise, how do you expect it to compare with the current parser or piping into awk?
Actually I thought of using Lua/LuaJIT long ago, but it just feels weird to use it outside gamedev. JS is more familiar but until recently the choice was between monstrous SpiderMonkey/V8.
Speed-wise it should be somewhere in the middle, I'd expect closer to the current parser. Awk introduces pretty large overhead of serializing and then parsing again, including fields that don't even participate.
Update: Duktape proved to be too slow, albeit faster than AWK. I may look into LuaJIT later.
Here's another interesting project which uses LLVM JIT: https://github.com/BoutrosLaboratory/bamql http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1162-y
Would it be possible to allow comparison between fields/tags? This would be very useful for things like
'ref_id != mate_ref_id'
. Thanks.