Closed superkostya closed 7 years ago
Cool, really exciting. Can't wait to take a deeper look.
Can you export the notebook to a script for easier code review? From explore
, run:
jupyter nbconvert --to=script Visualization_with_Vega-Lite_and_Altair.ipynb
Also I suggest moving this analysis to a new directory inside of explore
. In this directory, you can also export a vega-lite-heatmap.json
specification as it's own file.
Done. The created JSON file has a few changes already applied to it to improve the appearance. As I pointed out in the notebook, more formatting options need to be explored.
I think the next step is to touch up the vega-lite specificuation separately from altair. You've started to do this in your final notebook cell. What I think would be ideal is to separate the JSON for the dataset from the JSON of the vega-lite spec.
See this function for exporting a pandas.DataFrame
to the vega-lite JSON specification. Once you upload the data to GitHub, you can modify the vega-lite spec to load the data from a URL (example).
Then we'll be able to give the JSON spec directly to the frontend and they'll generate the data.
@bdolly currently @superkostya is generating the heatmap from the following data structure:
"data": {
"values": [
{
"disease": "adrenocortical cancer",
"gene_symbol": "AJUBA",
"count": 0.01282051282051282
},
{
"disease": "adrenocortical cancer",
"gene_symbol": "AMOT",
"count": 0
},
{
"disease": "adrenocortical cancer",
"gene_symbol": "AMOTL1",
"count": 0
},
{
"disease": "adrenocortical cancer",
"gene_symbol": "AMOTL2",
"count": 0.01282051282051282
},
{
"disease": "adrenocortical cancer",
"gene_symbol": "LATS1",
"count": 0
}
]
}
Each value encodes a single cell in the heatmap and is a (disease, gene, frequency-of-mutation) combination. The idea is for the heatmap to show all of the diseases and genes the user has selected. We can obviously change the what types of IDs we're using for genes and diseases.
@bdolly can the frontend generate the above data structure? Or should we accommodate a different input data structure?
Hi Kostyra, and Daniel,
Happy New Year.
Sorry to be late replying, just getting back into the swing of things.
The Chart object can take a file/url argument instead of a dataframe. This is how I’ve been doing it:
# heatmap cell size in pixels, matches default text size
# in jupyter notebook.
hm_cell_pixel_size=(8, 8)
hm_data_url = '3-tcga-hmdata.csv’
# hm_df is the tidied/normalized dataframe previously computed,
# or passed in once this is made into a function
hm_df.to_csv(hm_data_url)
hm_chart_url = '3-tcga-hmchart.json'
hm_chart_file = open(hm_chart_url,'w’)
hm_chart = Chart(hm_data_url).mark_text(
other parameters, ...)
print(hm_chart.to_json(indent=2), file=hm_chart_file)
hm_chart_file.close()
# altair chart display must be on the last line of jupyter cell
# this is a gotcha I found buried in the altair documentation
hm_chart
Minor nit: The TOTAL column should be moved to the right. This should be an easy slice and dice. Better yet, make it a parallel, single column heatmap, as it is not a gene_symbol. Compute it as part of the heatmap display process, rather than in the disease/gene_symbol dataframe as is currently done in "3.TCGA-MLexample_Pathway"
Management of the file name space, and deletion of .csv and .json files when no longer needed will need to be coordinated.
Daniel, Several changes have been made per your suggestion:
"url": "./heatmap_data_Altair_compatible.json"
Nice, looks almost ready to merge.
Can we rename explore/visualization_vega_lite_altair/
to explore/heatmap-vega-lite/
?
Would be nice if we could change the dashes to underscores in paths. So Visualization_with_Vega-Lite_and_Altair.ipynb
becomes Visualization-with-Vega-Lite-and-Altair.ipynb. Or even simplify to
heatmap.ipynb`.
Done. Files and the main directory are renamed per your suggestion.
This is a preliminary result for using the combination of Vega-Lite and Altair to visualize some of the obtained results, e.g. heatmaps. The main objective is to take advantage of the lean and sufficiently flexible JSON format for the graphs in Vega-Lite, which should allow us to generate the figures (at least some of them) on the front end, thereby reducing the Internet traffic and increasing the performance and speed.
The changes are as follows: