CharlesJB / ENCODExplorer

5 stars 4 forks source link

Changes to ENCODE metadata #45

Closed ericfournier2 closed 5 years ago

ericfournier2 commented 5 years ago

The following attributes have disappeared from experiment objects:

The following columns are new:

biosample_type and biosample_term_name were part of the final encode_df.

biosample_type entries are indices into the new biosample_type table (EX: "/biosample-types/cell_line_EFO_0001182/", "/biosample-types/primary_cell_CL_1000458/").

The biosample_type table has the following columns. Columns which directly reference old experiments columns are marked with *. Columns which were part of encode_df are marked with **

The lack of those columns cause export_ENCODEdb_matrix to fail when reordering columns. Three possible actions must be taken:

  1. At the very least, the function mustn't crash anymore and drop the missing columns silently.
  2. If possible, the columns from biosample_type which were previously in encode_df should now be joined into the experiment table so that information isn't lost.
  3. To go above and beyond the call of duty, the new columns of biosample_type which have no equivalent in the old data frame should also be added.