cBioPortal / GSoC

Documentation repository of Google Summer of Code (GSoC) project ideas for cBioPortal and related projects
106 stars 41 forks source link

Improve Quick Search for Patients and Samples #77

Closed Luke-Sikina closed 4 years ago

Luke-Sikina commented 4 years ago

Background: Quick search is a way to view studies, genes, patients and samples quickly from the main page of the website. You can see quick search in action by going to cbioportal.org, clicking on the Quick Search tab, and entering something in the search bar (BRAF will get you a good spread of results). Right now, quick search looks for patients and samples in a pretty primitive manner. For samples it just matches on sample ID, and for patients it matches to the patient ID or the ID's of the patient's samples.

Goal: Make quick search for patients and samples match on genes the patient has mutations in, associated protein changes, clinical attributes, and clinical events.

Approach: All file paths referenced in this section refer to paths within the cbioportal backend repo.

The current matching logic for patients resides in src/main/resources/org/cbioportal/persistence/mybatis/PatientMapper.xml; the logic for samples is in src/main/resources/org/cbioportal/persistence/mybatis/SampleMapper.xml. These files need to be changed so that keyword matches on the patient/sample's mutated genes, corresponding protein changes, clinical attributes, and clinical events. These pieces of information are stored on different tables; to see the relationships between the tables, you can either look at src/main/resources/cgds.sql or you can look at this database diagram.

These endpoints are called as you type in the quick search bar, so the endpoints have to remain performant even though you're adding functionality. Try to keep response times subsecond.

Need skills: SQL, Java

Possible mentors: Luke

pieterlukasse commented 4 years ago

Nice 👍 . Maybe also include finding patients/samples by clinical events (aka timeline entries).

Luke-Sikina commented 4 years ago

I understand the concerns about this being too complex given how we want to display the results. I'm going to close this issue for now so that no one gets confused.