Closed murraycutforth closed 10 months ago
I fixed it by doing a left join and then the WTs are preserved.. I may not have pushed it since I thought I would work on it this weekend and then didn't. I will look at it again tn and then merge
Okay cool. A left join would force every single well into the final table, but then we need to go back and remove wells which correspond to blanks (A1, maybe others in the final plate if it's half full), and put values in for the WT rows, so I think ultimately it would end up looking similar to what I wrote (add rows first and then inner join).
Btw, this is also probably a sign that this branch is getting too broad. I think we should merge it into main and then start new issues + branches if there is more work to be done, to avoid this problem of working on top of each other. Maybe not a problem any more since I don't anticipate working any more on the database creation myself? I'll just have a play around on the data-analysis branch this week.
P.S. my gut feeling is that there is a lot of scope for bugs to sneak in during the database creation step so if you can think of any more sanity check assertions to add then that would be really good.
I agree, I think we can return to this as needed
@codercahol I think we're ready to merge this branch into main now. Since you added the parquet files, I have done some refactoring of the hardcoded paths in the project to avoid any repetition between files (all paths/functions to cached files and downloaded data are in
paths.py
) and I also realised that the merging step between the experimental data df and the identity df was throwing away all wild type data. This is because there are no entries for WT in the identity spreadsheet, and the merge is an inner join. I think I've fixed this by just manually adding rows to the identity dataframe for all WT plates/wells, but could you check this?I've re-run and updated the database cache on the shared google drive.