cBioPortal / GSoC

Documentation repository of Google Summer of Code (GSoC) project ideas for cBioPortal and related projects
108 stars 42 forks source link

[r-client] cBioPortalData: Example Bioconductor Workflow #83

Open LiNk-NY opened 4 years ago

LiNk-NY commented 4 years ago

Background:

The cBioPortal R client opens up cancer sequencing data hosted on the cBioPortal for Cancer Genomics to alternative analysis platforms such as Bioconductor, an open source software for bioinformatics built on R.

Bioconductor provides many workflows for demonstrating use-cases for particular packages, analyses, visualizations, and technologies including (but not limited to):

All of the available Bioconductor workflows may be found here: http://bioconductor.org/packages/release/BiocViews.html#___Workflow

The cBioPortal provides a REST API for programmatic access to the data and leverages this same service to generate the visualizations and reports seen throughout the site. Although the types of visualizations and reports already available and provided by the cBioPortal are extensive, one may require additional customization options for their specific needs that cannot yet be done through the cBioPortal itself. Connecting to the API directly allows anyone to build their own custom visualizations and reports to suit their needs.

Users may access the REST API through command line tools, such as curl, or through API clients. The cBioPortal team has made 2 such API clients available: one written in R and another written in python. More information on these API clients and how to access and use them can be found here.

R is one of the leading programming languages in Data Science. As such, building an example Bioconductor workflow demonstrating the use of the cBioPortalData R client will be greatly beneficial to the cancer research community as a whole by making analyses and visualization of cancer sequencing data even more accessible.


Goal: To create an example Bioconductor workflow and iPython notebook demonstrating the use of cBioPortalData R client and a general Bioconductor approach to data analysis. To write supporting functions for visualizing and parsing metadata from the cBioPortalData endpoints as provided in the MultiAssayExperiment object obtained from cBioPortalData.

Approach:


Needed skills:

Possible mentors: @LiNk-NY @lwaldron

banerjeeshayantan commented 4 years ago

I work in the area of cancer informatics and develop machine learning models to distinguish between driver and passenger mutations. This link contains more details about my work. I have extensively used R/Bioconductor for my research. Can I take up this project? I am aware of the fact that GSoC application period is over but I want to contribute anyways.

alisman commented 4 years ago

Hi Shayantan,

Thanks for much for your offer. I will bring this up at our meeting this afternoon and get back to you.

--Aaron

On Sun, Apr 12, 2020 at 4:35 AM Shayantan Banerjee notifications@github.com wrote:

I work in the area of cancer informatics and develop machine learning models to distinguish between driver and passenger mutations. This link http://bit.ly/projects_list_2019 contains more details about my work. I have extensively used R/Bioconductor for my research. Can I take up this project? I am aware of the fact that GSoC application period is over but I want to contribute anyways.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cBioPortal/GSoC/issues/83#issuecomment-612582318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABNRGPZZO7AXVFYPZVVFCLRMF4MTANCNFSM4KP5G6JQ .

lwaldron commented 4 years ago

Dear @banerjeeshayantan, thanks for your interest! It would be great to have you take up this project. We can start a project and define some concrete issues at https://github.com/waldronlab/cBioPortalData/projects. Would be happy to set up a call to meet and discuss.

lwaldron commented 4 years ago

Update: I have created a long list of potential TODO items ranging from relatively quick to potentially hard at https://github.com/waldronlab/cBioPortalData/projects. @LiNk-NY @lgeistlinger feel free to edit/add.

banerjeeshayantan commented 4 years ago

I apologise for not replying earlier due to some other commitments. Can we talk sometime this week? I have already downloaded the package with all the necessary dependencies. Please let me know.

lwaldron commented 4 years ago

No problem @banerjeeshayantan, thanks for your continued interest. I will touch base with @LiNk-NY today to plan, and we'll be in touch again soon.

martinnnuez commented 2 years ago

My name is Martin Rodríguez Nuñez. I graduated in 2020 as an environmental engineer at the National University of Córdoba, Argentina (UNC). I am actually enrolled in a PhD program in engineering sciences focused on modeling fine particulate matter (PM2.5) levels employing meteorologic, geographic, remote sensing and land use variables as predictors. After finishing college I started a master's degree in applied statistics, where I realized that this was my true passion. I only owe the thesis to obtain my master's degree. I have a passion for statistical data analysis and predictive modeling, I have experience in these topics in R and python. I am currently working in the analysis of data as time series and the predictive modeling of these. I am interested in the project and especiallyespecially its goal, since it will be very beneficial to the cancer research community. I have no experience in this particular topic but I do know a lot about R and data analysis and I know that I am suitable to do it, besides, having a professional as a mentor would help to enhance my skills. Before submitting an application I have some doubts that I want to resolve: 1- I would like to know what would be the process to apply for the position and if it is available. 2- I would also like to ask if you know what support functions you would like to develop. Thank you very much in advance. I look forward to hearing from you. Best regards, Martin Rodriguez Nuñez.

lwaldron commented 2 years ago

Hi @martinnnuez, thanks for your interest! I think we have a good project for someone of your background, focusing on increasing the coverage of cBioPortal data imported by its Bioconductor client. There are many datasets which it fails to import for one reason or another (https://waldronlab.io/cBioPortalData/articles/cBioPortalDataErrors.html) that will require some combination of custom dataloaders, additional rules for the existing one, or correction of the data to resolve. @LiNk-NY is the developer and can describe in more detail, then we could all meet on a zoom call.

LiNk-NY commented 2 years ago

Hi Martin, @martinnnuez

Thank you for your interest! We are glad to have you help us. Please let us know what day works for you and either myself or Levi (or both) can meet with you. I can go over the details with regards to the getting the datasets in analysis ready shape.

Saludos, Marcel

martinnnuez commented 2 years ago

Hello good morning, sorry for my delay with the answer. In a previous message I understood that Levi was proposing another project which confused me a little. I am available to talk about any project in which you think I can contribute in the framework of the google summer of code. Let's agree for next week or the following week, when you have availability. Hope to hear from you soon. Martin


De: Marcel Ramos @.> Enviado: lunes, 28 de marzo de 2022 11:18 Para: cBioPortal/GSoC @.> Cc: Martin @.>; Mention @.> Asunto: Re: [cBioPortal/GSoC] [r-client] cBioPortalData: Example Bioconductor Workflow (#83)

Hi Martin, @martinnnuezhttps://github.com/martinnnuez

Thank you for your interest! We are glad to have you help us. Please let us know what day works for you and either myself or Levi (or both) can meet with you. I can go over the details with regards to the getting the datasets in analysis ready shape.

Saludos, Marcel

— Reply to this email directly, view it on GitHubhttps://github.com/cBioPortal/GSoC/issues/83#issuecomment-1080713186, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APNKQC66IDBTNZCZ4CES55TVCG5SFANCNFSM4KP5G6JQ. You are receiving this because you were mentioned.Message ID: @.***>

lwaldron commented 2 years ago

I think they're both viable projects, one documentation-focused, the other data wrangling-focused, so it will depend a bit on your interests. To tell you a little more about my suggestion, at https://waldronlab.io/cBioPortalData/articles/cBioPortalDataErrors.html you'll see we still have a lot of dataset build errors, which I would love to reduce or eliminate. These may be a fair bit of work to resolve since Marcel has already fixed a lot of the lower-hanging fruit. But between handling special cases on the R side, and potentially correcting some curation anomalies on the cBioPortal side, these should all be fixable. If we could get these all fixed, we could make a unit test on building a MultiAssayExperiment part of the curation process for cBioPortal and even stay at no build errors.

I'd suggest you meet with Marcel first, since no one knows more about MultiAssayExperiment and cBioPortalData than him. I'd certainly like to meet sometime but I'm pretty hectic during the next couple weeks.

martinnnuez commented 2 years ago

Perfect, @LiNk-NY let me know when we can coordinate a meeting. Thank you very much.

LiNk-NY commented 2 years ago

Hi Martin, @martinnnuez

I can meet on Friday or next week. Afternoons work best for me. You can reach me on the Bioc-community Slack (my handle there is mramos148). Register at https://bioc-community.herokuapp.com/ Looking forward to meeting with you!

Best, Marcel

imsarath commented 2 years ago

Hi, @LiNk-NY @lwaldron,

I am interested to work on this project. Having read Levi's comment above, I would like to know more about the data wrangling aspect of it. Please let me know how to proceed.

Thanks, Sarath

lwaldron commented 2 years ago

Great, Sarath! We should have a group call perhaps next week to sync up. Do you have time limitations? @link-ny let’s talk this week then propose some times. --

Levi Waldron

Associate Professor

Department of Epidemiology and Biostatistics

CUNY Graduate School of Public Health and Health Policy

Institute for Implementation Science in Population Health

55 W 125th St, New York NY 10035

https://waldronlab.io

Join the microbiome Virtual International Forum: https://microbiome-vif.org

LiNk-NY commented 2 years ago

Hi Sarath, @imsarath Any updates on this? Are you still interested? Thanks! -Marcel

bhavy2202 commented 1 year ago

hey, is this issue still open for GSOC 23? Im interested to work in this project. thank you