Closed ThomasMZheng closed 1 year ago
Created a "shrunk" map for the Vcode_map, which removes all entries that were not sent for core analysis, and removed duplicate BQC IDs that were after the initial draw. - Might want to include this data afterwards.
Duplicates were still present until "Extract" was also filtered by "Y". Removed one duplicate BQC -> VAP was the same for both Removed on entry BQC -> NULL
Created a lookup table named "Phenotype-BQC-VAP_Map.csv" which links each VAP number with the BQC ID, and the phenotype data of "Sex, COVID Status, Age".
---Phenotypic Linking---
I have the phenotypic data for linking the proteomic data to the sex, age, and infection of the patients.
Problem: The proteomic data (SS_x.adat) links the protein data with a unique VAP ID (SubjectID), however, that gives no information on sex, age, or infection status.
We have a VCode map (Vcode_mapping.xlsx) which links the VAP ID to the BQC ID
We also have a redcap map (redcap.csv) which links the BQC ID with the phenotypic data (sex, age, and infection status).
Solution: I need to link the phenotypic data to the VAP ID through the VCode map, a single table should be sufficient to add on the relevant info while also deleting any superfluous data (This is a Dataset combination issue)
---Dataset Combination---
I have 4 datasets to use, however, 2 of them are copies with the data normalized to the medium norm intensity.
Regardless, I still have 2 unique datasets, one from November of 2020, and the other from June of 2021. I need to check if a) there are any duplicate data entries across the datasets (I doubt this, but this would allow for better batch control).
I just talked with Chen-Yang, and he mentioned how some recent papers do not do batch correction, so I need to read up on the literature.
Aside from that, it appears that I can just concatenate the two datasets together.