COMBINE-lab / piscem

Rust wrapper for the next generation (still currently in C++)
BSD 3-Clause "New" or "Revised" License
15 stars 1 forks source link

Possible to parse/view a piscem map-bulk RAD file? #21

Open dduchen opened 1 month ago

dduchen commented 1 month ago

Hello, Thank you for developing these suite of tools - I'm very interested in leveraging the piscem map-bulk and piscem-infer workflow to process some joint host-metagenomic sequencing data, and am hoping to parse the pseudomapping results. For example, after quantification, I'd like to interrogate which reads are assigned/mapped to specific reference/index contigs/transcripts.

The alevin-fry view command doesn't seem to work for piscem map-bulk RAD files.

If this isn't currently possible via 'viewing' the RAD file, is there an alternative way you recommend I accomplish this?

Thank you!

rob-p commented 1 month ago

Hi @dduchen,

Thanks for the question and for your interest in our pipeline here! Indeed, the alevin-fry view command is specifically for single-cell RAD files. Writing a viewer for bulk RAD files is quite straightforward given our libradicl library; though we've not made such a stand-alone program.

If this is something that you're interested in having, you can give a poke around the examples in the libradicl crate to see how you might write one. Otherwise, this is a generally useful tool and we'd be happy to help put together and make available! It wouldn't take someone familiar with libradicl (i.e. someone in the lab) more than a few hours to put together a basic viewer. However, we probably won't get around to it until next week or so.

--Rob

dduchen commented 1 month ago

Thank you for your response, and for providing me with another reason to learn rust - I won't likely be able to write anything, so any help on making this a function of piscem would be greatly appreciated. There's no immediate rush - but I'm definitely planning on incorporate piscem / piscem-infer into some workflows I'm developing to interrogate both bulk and single-cell metagenomic/metatranscriptomic host-pathogen interactions. Thanks again!

rob-p commented 4 weeks ago

Hi @dduchen,

Sorry for the delay, but I think I now have something that you can test out. There were a couple of different requests for odds-and-ends related to RAD files, so I figured it would make sense to have a dedicated repository where such tools could live. So I created a new radtk repository. Right now, you'll need to compile it from source, but that should be a simple cargo build --release. If you need an executable, then I can set up the repository to cut one when a release is made.

The tool you'll want to look at is view, so your command would look something like:

radtk view -r bulk -i <input_file> -o <output_json>

it will dump the RAD file into a JSON format file. If you leave -o off, it will write directly to stdout. Also, if you don't want to print the header you can pass the --no-header flag, and if you want to use names rather than IDs for the mapped targets, you should pass the --use-ref-name flag. Please let me know if you get a chance to try this out. I've only tested it at a small scale so far, and it's very much under development, so I'm open to suggestions and feedback.

P.S. I actually went ahead and set up the GitHub actions for building binaries, so you can grab a binary release to test this out if that's easier.

rob-p commented 3 weeks ago

Hi @dduchen,

Any chance to try this out yet?