Open LiNk-NY opened 4 years ago
I work in the area of cancer informatics and develop machine learning models to distinguish between driver and passenger mutations. This link contains more details about my work. I have extensively used R/Bioconductor for my research. Can I take up this project? I am aware of the fact that GSoC application period is over but I want to contribute anyways.
Hi Shayantan,
Thanks for much for your offer. I will bring this up at our meeting this afternoon and get back to you.
--Aaron
On Sun, Apr 12, 2020 at 4:35 AM Shayantan Banerjee notifications@github.com wrote:
I work in the area of cancer informatics and develop machine learning models to distinguish between driver and passenger mutations. This link http://bit.ly/projects_list_2019 contains more details about my work. I have extensively used R/Bioconductor for my research. Can I take up this project? I am aware of the fact that GSoC application period is over but I want to contribute anyways.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cBioPortal/GSoC/issues/83#issuecomment-612582318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABNRGPZZO7AXVFYPZVVFCLRMF4MTANCNFSM4KP5G6JQ .
Dear @banerjeeshayantan, thanks for your interest! It would be great to have you take up this project. We can start a project and define some concrete issues at https://github.com/waldronlab/cBioPortalData/projects. Would be happy to set up a call to meet and discuss.
Update: I have created a long list of potential TODO items ranging from relatively quick to potentially hard at https://github.com/waldronlab/cBioPortalData/projects. @LiNk-NY @lgeistlinger feel free to edit/add.
I apologise for not replying earlier due to some other commitments. Can we talk sometime this week? I have already downloaded the package with all the necessary dependencies. Please let me know.
No problem @banerjeeshayantan, thanks for your continued interest. I will touch base with @LiNk-NY today to plan, and we'll be in touch again soon.
My name is Martin Rodríguez Nuñez. I graduated in 2020 as an environmental engineer at the National University of Córdoba, Argentina (UNC). I am actually enrolled in a PhD program in engineering sciences focused on modeling fine particulate matter (PM2.5) levels employing meteorologic, geographic, remote sensing and land use variables as predictors. After finishing college I started a master's degree in applied statistics, where I realized that this was my true passion. I only owe the thesis to obtain my master's degree. I have a passion for statistical data analysis and predictive modeling, I have experience in these topics in R and python. I am currently working in the analysis of data as time series and the predictive modeling of these. I am interested in the project and especiallyespecially its goal, since it will be very beneficial to the cancer research community. I have no experience in this particular topic but I do know a lot about R and data analysis and I know that I am suitable to do it, besides, having a professional as a mentor would help to enhance my skills. Before submitting an application I have some doubts that I want to resolve: 1- I would like to know what would be the process to apply for the position and if it is available. 2- I would also like to ask if you know what support functions you would like to develop. Thank you very much in advance. I look forward to hearing from you. Best regards, Martin Rodriguez Nuñez.
Hi @martinnnuez, thanks for your interest! I think we have a good project for someone of your background, focusing on increasing the coverage of cBioPortal data imported by its Bioconductor client. There are many datasets which it fails to import for one reason or another (https://waldronlab.io/cBioPortalData/articles/cBioPortalDataErrors.html) that will require some combination of custom dataloaders, additional rules for the existing one, or correction of the data to resolve. @LiNk-NY is the developer and can describe in more detail, then we could all meet on a zoom call.
Hi Martin, @martinnnuez
Thank you for your interest! We are glad to have you help us. Please let us know what day works for you and either myself or Levi (or both) can meet with you. I can go over the details with regards to the getting the datasets in analysis ready shape.
Saludos, Marcel
Hello good morning, sorry for my delay with the answer. In a previous message I understood that Levi was proposing another project which confused me a little. I am available to talk about any project in which you think I can contribute in the framework of the google summer of code. Let's agree for next week or the following week, when you have availability. Hope to hear from you soon. Martin
De: Marcel Ramos @.> Enviado: lunes, 28 de marzo de 2022 11:18 Para: cBioPortal/GSoC @.> Cc: Martin @.>; Mention @.> Asunto: Re: [cBioPortal/GSoC] [r-client] cBioPortalData: Example Bioconductor Workflow (#83)
Hi Martin, @martinnnuezhttps://github.com/martinnnuez
Thank you for your interest! We are glad to have you help us. Please let us know what day works for you and either myself or Levi (or both) can meet with you. I can go over the details with regards to the getting the datasets in analysis ready shape.
Saludos, Marcel
— Reply to this email directly, view it on GitHubhttps://github.com/cBioPortal/GSoC/issues/83#issuecomment-1080713186, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APNKQC66IDBTNZCZ4CES55TVCG5SFANCNFSM4KP5G6JQ. You are receiving this because you were mentioned.Message ID: @.***>
I think they're both viable projects, one documentation-focused, the other data wrangling-focused, so it will depend a bit on your interests. To tell you a little more about my suggestion, at https://waldronlab.io/cBioPortalData/articles/cBioPortalDataErrors.html you'll see we still have a lot of dataset build errors, which I would love to reduce or eliminate. These may be a fair bit of work to resolve since Marcel has already fixed a lot of the lower-hanging fruit. But between handling special cases on the R side, and potentially correcting some curation anomalies on the cBioPortal side, these should all be fixable. If we could get these all fixed, we could make a unit test on building a MultiAssayExperiment part of the curation process for cBioPortal and even stay at no build errors.
I'd suggest you meet with Marcel first, since no one knows more about MultiAssayExperiment and cBioPortalData than him. I'd certainly like to meet sometime but I'm pretty hectic during the next couple weeks.
Perfect, @LiNk-NY let me know when we can coordinate a meeting. Thank you very much.
Hi Martin, @martinnnuez
I can meet on Friday or next week.
Afternoons work best for me.
You can reach me on the Bioc-community Slack (my handle there is mramos148
).
Register at https://bioc-community.herokuapp.com/
Looking forward to meeting with you!
Best, Marcel
Hi, @LiNk-NY @lwaldron,
I am interested to work on this project. Having read Levi's comment above, I would like to know more about the data wrangling aspect of it. Please let me know how to proceed.
Thanks, Sarath
Great, Sarath! We should have a group call perhaps next week to sync up. Do you have time limitations? @link-ny let’s talk this week then propose some times. --
Levi Waldron
Associate Professor
Department of Epidemiology and Biostatistics
CUNY Graduate School of Public Health and Health Policy
Institute for Implementation Science in Population Health
55 W 125th St, New York NY 10035
Join the microbiome Virtual International Forum: https://microbiome-vif.org
Hi Sarath, @imsarath Any updates on this? Are you still interested? Thanks! -Marcel
hey, is this issue still open for GSOC 23? Im interested to work in this project. thank you
Background:
The cBioPortal R client opens up cancer sequencing data hosted on the cBioPortal for Cancer Genomics to alternative analysis platforms such as Bioconductor, an open source software for bioinformatics built on R.
Bioconductor provides many workflows for demonstrating use-cases for particular packages, analyses, visualizations, and technologies including (but not limited to):
All of the available Bioconductor workflows may be found here: http://bioconductor.org/packages/release/BiocViews.html#___Workflow
The cBioPortal provides a REST API for programmatic access to the data and leverages this same service to generate the visualizations and reports seen throughout the site. Although the types of visualizations and reports already available and provided by the cBioPortal are extensive, one may require additional customization options for their specific needs that cannot yet be done through the cBioPortal itself. Connecting to the API directly allows anyone to build their own custom visualizations and reports to suit their needs.
Users may access the REST API through command line tools, such as
curl
, or through API clients. The cBioPortal team has made 2 such API clients available: one written inR
and another written inpython
. More information on these API clients and how to access and use them can be found here.R
is one of the leading programming languages in Data Science. As such, building an example Bioconductor workflow demonstrating the use of the cBioPortalData R client will be greatly beneficial to the cancer research community as a whole by making analyses and visualization of cancer sequencing data even more accessible.Goal: To create an example Bioconductor workflow and iPython notebook demonstrating the use of cBioPortalData R client and a general Bioconductor approach to data analysis. To write supporting functions for visualizing and parsing metadata from the cBioPortalData endpoints as provided in the MultiAssayExperiment object obtained from cBioPortalData.
Approach:
trackViewer
)Needed skills:
R
(analysis and pkg dev), BioconductorPossible mentors: @LiNk-NY @lwaldron