ThomasMZheng / Proteomics-BQC

All of the work done while a part of the Richards Lab will be here
0 stars 0 forks source link

2023/10/26 #8

Closed ThomasMZheng closed 1 year ago

ThomasMZheng commented 1 year ago

---Phenotypic Linking---

I have the phenotypic data for linking the proteomic data to the sex, age, and infection of the patients.

Problem: The proteomic data (SS_x.adat) links the protein data with a unique VAP ID (SubjectID), however, that gives no information on sex, age, or infection status.

We have a VCode map (Vcode_mapping.xlsx) which links the VAP ID to the BQC ID

We also have a redcap map (redcap.csv) which links the BQC ID with the phenotypic data (sex, age, and infection status).

Solution: I need to link the phenotypic data to the VAP ID through the VCode map, a single table should be sufficient to add on the relevant info while also deleting any superfluous data (This is a Dataset combination issue)

---Dataset Combination---

I have 4 datasets to use, however, 2 of them are copies with the data normalized to the medium norm intensity.

Regardless, I still have 2 unique datasets, one from November of 2020, and the other from June of 2021. I need to check if a) there are any duplicate data entries across the datasets (I doubt this, but this would allow for better batch control).

I just talked with Chen-Yang, and he mentioned how some recent papers do not do batch correction, so I need to read up on the literature.

Aside from that, it appears that I can just concatenate the two datasets together.

ThomasMZheng commented 1 year ago

6

Created a "shrunk" map for the Vcode_map, which removes all entries that were not sent for core analysis, and removed duplicate BQC IDs that were after the initial draw. - Might want to include this data afterwards.

ThomasMZheng commented 1 year ago

Duplicates were still present until "Extract" was also filtered by "Y". Removed one duplicate BQC -> VAP was the same for both Removed on entry BQC -> NULL

ThomasMZheng commented 1 year ago

Created a lookup table named "Phenotype-BQC-VAP_Map.csv" which links each VAP number with the BQC ID, and the phenotype data of "Sex, COVID Status, Age".

ThomasMZheng commented 1 year ago

5 Need to work more on Combining Datasets in a code-wise manner.