cfe-lab / proviral

0 stars 0 forks source link

Tool for plotting proviral landscapes #16

Closed CBeelen closed 1 year ago

CBeelen commented 1 year ago

Users have asked for a tool to plot proviral landscapes / virograms, which visualise the composition and the nature of the defects in the proviral reservoir. Examples of these plots can be found in Natalie's thesis, starting on page 148. Natalie has also given us her R tool to create these plots (in macdatafile under Natalie/Data/ReservoirStudiesData/CharlotteDemoStuff).

We will either use her plotting tool or tweak or re-write it in Python. The required data for the plots are an alignment to hxb2 and the proviral pipeline's verdict on the sequence (intact, hypermutated, etc.).

The main difficulty when generating these plots is that the sequences for one participant may be spread across several runs, and we cannot easily identify them by their sample names. So users will have to manually select which sequences to include in their plots. Also, users have asked for the option to manually override some of the plotting input data, for example the alignments.

For now, the suggested solution is that the proviral pipeline will generate a collated csv file containing all of the necessary data to make this plot. Then, the users can copy and paste the rows corresponding to the sequences they want to include into a new csv file. They can then upload this csv file into the plotting tool, which could live in the BBLab tools webpage, for example, and generate the plot. This will also allow them to manually modify alignments, should they wish to.

CBeelen commented 1 year ago

Once we are uploading the csv with the plotting data into the database as per #17, we will have a few more options to generate the csv or the plots - for example, users could select sequences in QAI and have QAI generate the collated csv or even the proviral landscape plot for them.