broadinstitute / lincs-profiling-complementarity

Analyzing and comparing signal found in different profiling technologies
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

[Response to Review] Filter GO terms for ML analysis #67

Closed gwaybio closed 2 years ago

gwaybio commented 2 years ago

In #59 , I added additional Y matrices for use in our ML pipeline. One of these matrices included GO term annotations per compound. However, there were over 5,000 different terms in that PR.

Here, I filter the GO terms to include only those that have annotations for greater than or equal to 20 compounds. Most GO terms (about 1,000) had annotations for only 1 compound.

This filtering step reduced the dimensions to 772 unique GO terms. This will be much better to train multi-label classifiers in part 2 of the ML analysis

See #60 for more details