INCATools / biosample-analysis

analysis of biosamples in INSDC
3 stars 1 forks source link

Only collect harmonized_name when building sample TSV #6

Closed cmungall closed 4 years ago

cmungall commented 4 years ago

We only care about harmonized_name (ie mixs) for now. We can do an analysis of others later.

This should substantially reduce the time to build the table, as well as the resulting size

wdduncan commented 4 years ago

I created an EAV using this script: https://github.com/INCATools/biosample-analysis/blob/master/util/harmonized-name-eav.pl

There are 447 harmonized attributes. See notebook https://nbviewer.jupyter.org/github/INCATools/biosample-analysis/blob/master/src/notebooks/analyze-harmonized-data.ipynb

cmungall commented 4 years ago

Where is the tsv deposited?

On Tue, Sep 1, 2020, 19:15 Bill Duncan notifications@github.com wrote:

I created an EAV using this script: https://github.com/INCATools/biosample-analysis/blob/master/util/harmonized-name-eav.pl

There are 447 harmonized attributes. See notebook https://nbviewer.jupyter.org/github/INCATools/biosample-analysis/blob/master/src/notebooks/analyze-harmonized-data.ipynb

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/INCATools/biosample-analysis/issues/6#issuecomment-685243090, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONPVEYW76NKCCYDQJTSDWTDXANCNFSM4QSOTUFA .

wdduncan commented 4 years ago

I haven't uploaded it yet. But I will :)

wdduncan commented 4 years ago

Details of where the harmonized data exist is on the main README. cc @cmungall