PNNL-CompBio / coderdata

Automation scripts and benchmark dataset package for cancer drug prediction deep learning models.
Other
11 stars 3 forks source link

Various Data fixes and Schema Alignments needed. #115

Closed jjacobson95 closed 5 months ago

jjacobson95 commented 5 months ago

This may not be up to date as I'm using some of the synapse data generated last week but I'm just trying to get ahead of a few fixes.

jjacobson95 commented 5 months ago

I'll note other issues as I find them in this thread.

Schema alignments - all of the following must be converted from float to integer:

Never mind on these - I think this is just how pandas is reading them into the dataframes.

sgosline commented 5 months ago

I'm guessing these were all a result of my working off of main instead of the builder branch that was way ahead, I'll get to these shortly, still fixing another docker build issue.

jjacobson95 commented 5 months ago

Okay taking a look at the new build (2024_03_20 build) and here are a couple updates needed to get these aligned to the schema and working with the package:

jjacobson95 commented 5 months ago

Another issue with the 2024_03_20 build that I just found: Depmap proteomics file is missing most of its data. There is only proteomics info for a single sample.

sgosline commented 5 months ago

please create separate issues so i can tag them. this was an easy fix, and is going in the latest PR.

sgosline commented 5 months ago

110 fixes these