MrOlm / inStrain

Bioinformatics program inStrain
MIT License
137 stars 33 forks source link

Get Raw Data from readANI_distribution plot #62

Open elizabethmcd opened 3 years ago

elizabethmcd commented 3 years ago

I would like to get the raw data for ANI distributions of reads in the provided plot. I don't think this is accessible through the existing raw files that are outputted with the API. Is inStrain filtering the BAM file based on the PID flag with something like bedtools (I think that's how I've done it before)? I just want to parse the same information that is given in the readANI plot to do downstream things with the data for multiple samples to the same genome.

MrOlm commented 3 years ago

Hello,

Yes, you're right that at the moment this information isn't accessible through the public API. The next big inStrain upgrade will be adding functionality to parse the raw_data to create tables like this on-demand, but unfortunatley that won't be ready for a while.

The information for this plot is generated using the function prepare_read_ani_dist_plot, in the file inStrain/plotting/mapping_plots.py (https://github.com/MrOlm/inStrain/blob/143058f94753d3225b55a225b1feab4227079ac9/inStrain/plotting/mapping_plots.py#L153)

The method takes in an inStrain profile object and loads the covT and scaffold2length objects from raw_data to create this information. You should be able to either import and call this method, or copy the code into a new python notebook and run it yourself. That said, I realize that this isn't a very straightforward process and I'm happy to help when you run into snags.

Best, <att

elizabethmcd commented 3 years ago

Thanks! I will definitely try importing the method and see if I can get what I need from there.