DataBiosphere / data-portal

Apache License 2.0
9 stars 22 forks source link

FASTGenomics #540

Closed theathorn closed 3 years ago

theathorn commented 4 years ago

@mvonpapen commented on Tue Sep 24 2019

Thank you for submitting a portal to the HCA DCP Methods Registry!

To expedite your portal's addition to https://data.humancellatlas.org/analyze, please provide the following package metadata. You can easily edit this information later by clicking "Improve this page" at the bottom of your portal's detail page (example).

Required:

Optional:

FASTGenomics - a platform to share single-cell RNA sequencing data and analyses using reproducible workflows

image

Recent technological advances enable genomics of individual cells, the building blocks of all living organisms. Single cell data characteristics differ from those of bulk data, which led to a plethora of new analytical strategies. However, solutions are only useful for experts and currently, there are no widely accepted gold standards for single cell data analysis. To meet the requirements of analytical flexibility, reproducibility, ease of use and data security, we developed FASTGenomics as a powerful, efficient, versatile, robust, safe and intuitive analytical ecosystem for single-cell transcriptomics [Scholz et al., 2018]. This development has been carried out at Comma Soft, Bonn, in collaboration with the Schultze lab at LIMES Institute in Bonn, Germany. 

image

FASTGenomics is designed as a platform for single cell RNA-seq data open to the scientific community. A major feature is to provide highest reproducibility and transparency for single cell data analysis to the whole community. The platform provides publicly available datasets and analyses for exploration and visualization of gene expression data. Using docker containers provides full reproducibility and helps avoiding "works only on my machine" problems. Register now at https://www.fastgenomics.org and have a look at our collection of data sets and example analyses.

FASTGenomics serves as a platform, where you can share your data with the community and test novel algorithms on public data sets with known results. FASTGenomics scales already routinely to more than 300k cells per project and prototype apps suggest that scaling to 1M cells is also possible [Scholz et al., 2018]. Moreover, its hybrid design also allows custom solutions such as FASTGenomics on premise for clinical and pharmaceutical research facilities.

For more information, you can find us on Twitter, LinkedIn or come talk to us on Slack. To register for the platform visit us at www.fastgenomics.org.

Note: We will soon offer interactive analyses based on Jupyter notebooks. Stay tuned for the upcoming beta-test!

REFERENCES  Scholz et al. (2018) FASTGenomics: An analytical ecosystem for single-cell RNA sequencing data. BioRxiv, 272476.

matthewspeir commented 4 years ago

Hello, @mvonpapen.

Thank you for submitting an analysis portal to the DCP. I will review your portal against our registry standards and let you know if we have any questions.

matthewspeir commented 4 years ago

Hello, @mvonpapen.

Thank you again for submitting an analysis portal to the registry. I'm hoping you can answer a couple of questions about how your portal adheres to our standards:

  1. For the "Use Containers and Modules" standard, it looks like you have Docker containers for your tools at https://hub.docker.com/u/fastgenomics. I noticed that you have github repos for your python client and r client for your portal, but I didn't see a docker instance for either of these in Docker hub. Am I just not seeing them or are they not present? If they're not present, are there plans to add them?

  2. For the "Register Upstream", I can't seem to find your platform, clients, or the related utilities registered in bioconductor or bioconda. Are there plans to register all of your stuff in one of these platforms? (It's also possible they're already there and I just missed them.)

Thanks!

Matthew

mvonpapen commented 4 years ago

Hi Matthew,

  1. The python and r clients you listed are actually outdated and will be deposited soon. They were used to access our database from outside the platform. Right now, we are actually in a closed beta phase and will soon relaunch. For that, we set up two new clients, fgread-r https://github.com/FASTGenomics/fgread-r and fgread-py https://github.com/FASTGenomics/fgread-py, that are used to load data from our internal database within the platform ( https://prod.fastgenomics.org). As such, the new clients are part of the jupyter images that we provide for the analyses, see, e.g., https://hub.docker.com/r/fastgenomics/jupyter-scanpy.

  2. In the future, we may again plan to develop clients for external access to our database. These clients could then well be added to bioconda and/or bioconductor. As our current clients are only working within the platform, we did not think about adding them to these repositories. We have also moved away from providing analysis apps (which might be added to bioconda/bioconductor) to providing complete Jupyter notebooks for the analyses. The platform itself belongs to Comma Soft AG https://www.comma-soft.com in Bonn, Germany, and will not be open-sourced.

I hope I could answer your questions. Please let me know if you need additional information.

Best reagrds, Mitch

Am Mi., 2. Okt. 2019 um 19:46 Uhr schrieb Matt Speir < notifications@github.com>:

Hello, @mvonpapen https://github.com/mvonpapen.

Thank you again for submitting an analysis portal to the registry. I'm hoping you can answer a couple of questions about how your portal adheres to our standards:

1.

For the "Use Containers and Modules" standard, it looks like you have Docker containers for your tools at https://hub.docker.com/u/fastgenomics. I noticed that you have github repos for your python client https://github.com/FASTGenomics/py_client and r client https://github.com/FASTGenomics/r_client for your portal, but I didn't see a docker instance for either of these in Docker hub. Am I just not seeing them or are they not present? If they're not present, are there plans to add them? 2.

For the "Register Upstream", I can't seem to find your platform, clients, or the related utilities registered in bioconductor or bioconda. Are there plans to register all of your stuff in one of these platforms? (It's also possible they're already there and I just missed them.)

Thanks!

Matthew

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HumanCellAtlas/data-portal/issues/540?email_source=notifications&email_token=AGIO6V6L4QEPC72GSZO34A3QMTM7TA5CNFSM4I2EQDC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAFTGPY#issuecomment-537604927, or mute the thread https://github.com/notifications/unsubscribe-auth/AGIO6V7TCLLU3IWK2JYDNJTQMTM7TANCNFSM4I2EQDCQ .

mvonpapen commented 4 years ago

Hi @matthewspeir,

Did you receive my response or do you need any additional information? Just to let you know, we will soon switch our platform to beta.fastgenomics.org. prod.fastgenomics.org will be closed at the end of the month.

Best, Mitch

mvonpapen commented 4 years ago

Dear @theathorn ,

A new version of FASTGenomics has been launched and therefore I hereby slightly update the original description. The updated submit is the following:

Optional:

FASTGenomics - a platform to analyze and share single-cell RNA sequencing data with reproducible Jupyter notebooks

webpage_alt

Recent technological advances enable genomics of individual cells, the building blocks of all living organisms. Single cell data characteristics differ from those of bulk data, which led to a plethora of new analytical strategies. However, solutions are only useful for experts and currently, there are no widely accepted gold standards for single cell data analysis. To meet the requirements of analytical flexibility, reproducibility, ease of use and data security, we developed FASTGenomics as a powerful, efficient, versatile, robust, safe and intuitive analytical ecosystem for single-cell transcriptomics [Scholz et al., 2018]. This development has been carried out at Comma Soft, Bonn, in collaboration with the Schultze lab at LIMES Institute in Bonn, Germany.

fg_jupyter

FASTGenomics is designed as a platform for single cell RNA-seq data open to the scientific community. A major feature is to provide highest reproducibility and transparency for single cell data analysis to the whole community. The platform provides publicly available datasets and analyses in the form of Jupyter notebooks for the exploration and visualization of gene expression data. Using docker containers provides full reproducibility and helps avoiding "works only on my machine" problems. Register now at https://www.fastgenomics.org and have a look at our collection of data sets and example analyses.

FASTGenomics serves as a platform, where you can share your data with the community and test novel algorithms on public data sets with known results. FASTGenomics scales already routinely to more than 300k cells per project and prototype apps suggest that scaling to 1M cells is also possible [Scholz et al., 2018]. Moreover, its hybrid design also allows custom solutions such as FASTGenomics on premise for clinical and pharmaceutical research facilities.

For more information, you can find us on Twitter, LinkedIn or come talk to us on Slack. To register for the platform visit us at www.fastgenomics.org.

REFERENCES  Scholz et al. (2018) FASTGenomics: An analytical ecosystem for single-cell RNA sequencing data. BioRxiv, 272476.

matthewspeir commented 4 years ago

Hello, Michael.

I apologize for not responding sooner, I just returned from 3 weeks of traveling last Friday.

Based on your response, it doesn't sound like your tool meets the registry standards (specifically 'Be Free and Open Source' and 'Register Upstream'): https://data.humancellatlas.org/contribute/analysis-tools-registry/registry-standards. Can you please review all of the standards and describe how your tool meets the six 'Required' ones listed on that page?

mvonpapen commented 4 years ago

Hi @matthewspeir ,

I hope you had a good trip :-)

FASTGenomics is indeed a little different from the other analysis tools on the HCA site in the sense that we are an online platform and not a tool or (yet) a tertiary portal. Also, our "method packages" are Jupyter notebooks and not real packages or modules. However, all our analyses and datasets are open and accessible, so I think that we meet the registry standards.

So let's go through the registry standards one by one:

Please let me know, if I'm missing something or if you need further clarification.

matthewspeir commented 4 years ago

Hi, @mvonpapen.

Thanks for providing those details. Just to be sure I'm understanding correctly, you're providing open-source tools to read data into and out of your closed platform? At least that's what I gather from looking at the Github readmes for the two fgreader tools. I'm not sure that this fits with the spirit of the DCP registry standards.

mvonpapen commented 4 years ago

Dear @matthewspeir ,

Yes, you are correct. We provide open-source-tools to read data into and out of our closed platform, and additionally to analyze this data. However, using the provided source code from git, people can actually do this without using our free(!) platform.

As I said earlier, we are very interested to get listed in the Analysis tools registry and we have already taken several steps to meet your standards. In contrast, we noticed that the Analysis tools registry even lists proprietary software such as the Bioturing Browser, which not only closed-source but also not free. Could you please explain what exactly you mean with us not fitting the spirit of DCP registry standards?

Best, Mitch

matthewspeir commented 4 years ago

Hi, @mvonpapen.

Admittedly, I did not write the standards and have only been involved in reviewing a few of the most recent submissions to the Analysis registry. I have been applying the standards as I understand them (and with some guidance from those who wrote them) to these submissions. Since I was not involved in adding the Bioturing Browser to the registry, I can't comment on what was involved there or how the decision was made.

I believe Jean Chang (@jlchang) and Tim Tickle (@TimothyTickle) were both involved in the creation of these standards. Perhaps they could comment on how your portal meets the DCP standards.

mvonpapen commented 4 years ago

Hi @matthewspeir ,

Thank you for the reference to @TimothyTickle and @jlchang. We are eager to meet your standards and some advice how to do that would help a lot at the moment.

Best, Mitch

mvonpapen commented 4 years ago

Dear @theathorn ,

We are still willing to provide all necessary information to be listed in the analysis portal registry. As I outlined above, all relevant information is publicly available in our github. In addition to the analysis code we also provide test-data as well as a test environment that is easy to set up.

I noticed that there are two commercial providers (BioTuring and DNAStack) in the analysis registry at the moment that offer similar services as we do. Therefore, I hope that there is a way that FASTGenomics can also be included in the registry.

I would be very happy to hear from you.

Best rregards, Mitch

theathorn commented 4 years ago

Hello @mvonpapen,

The DCP is currently in maintenance mode and not accepting new portal registrations, but we expect to resume the registration process in the coming months. I will update this ticket when that happens.

Trevor

NoopDog commented 3 years ago

Hi @mvonpapen The HCA data portal has once again begun accepting submissions to the Methods Registry. Can you take a look at this submission and make any required updates? We will then restart the approval process.

mvonpapen commented 3 years ago

Hi @NoopDog , I'm happy to hear that you re-opened for submissions. I've updated our description and look forward to proceed with the process.

Required:

Tool title: FASTGenomics Contact name: Team FASTGenomics Contact email: contact@fastgenomics.org Who to attribute: Comma Soft AG Portal URL: https://beta.fastgenomics.org Short description: FASTGenomics is a cloud-based collaboration platform providing data management and reproducible analyses for scRNA-seq and omics data. FASTGenomics allows you to collaborate in groups, share data, create individual analyses and publish interactive results.

Optional: Long description

FASTGenomics - a cloud-based collaboration platform for data management and reproducible analyses of scRNA-seq and omics data

Collaboration and data sharing is key in biomedical research. It involves experts from several fields of study such as Molecular Biology, Immunology, Data Science and Computer Science as well as storage and re-use of data in a reproducible environment. Our Life and Data Science experts at Comma Soft have therefore developed the open platform FASTGenomics, which provides a common infrastructure, smart data management, is easy to use and allows direct access to data and results. It thus acts as a single point of truth and brings together all collaborators of your project.

The aim of FASTGenomics is to provide highest reproducibility and transparency for single cell and omics data analysis to the whole community. The platform offers publicly available datasets, reproducible analyses and interactive projects for exploration and visualization of gene expression data. Docker containers provide full reproducibility and help avoiding "works only on my machine" problems.

FASTGenomics is an open-access platform and is used as the central data and analytics platform in various European research projects such as the Human Cell Atlas project discovAIR and the EU H2020 project SYSCID.

We are an experienced partner with a tight network of leading experts from Bioinformatics, Immunology and Pharma. Also, we are an active member of several academic networks such as the Human Cell Atlas Lung Biological Network, Sparse2Big, and Single Cell Omics Germany. Together, we can help you get started with your research project, assist in data management, and leverage the power of state-of-the-art AI-based techniques. Our hybrid design also allows custom solutions such as FASTGenomics on-premises for clinical and pharmaceutical research facilities.

Where to find us:

Twitter: @FASTGenomics Youtube: FASTGenomics channel Slack: Slack support channel Github: https://github.com/FASTGenomics Docker: https://hub.docker.com/u/fastgenomics

Logo or screenshot:

FASTGenomics_oneplace

mvonpapen commented 3 years ago

Hi @NoopDog , Do you have any updates regarding the submission process?

NoopDog commented 3 years ago

Hi, @mvonpapen we hope to have the latest round of applicants processed this week. Will be back to you in the next couple of days with an update. Thank you for your patience! Cheers, D

NoopDog commented 3 years ago

Hi, @mvonpapen I have reached out internally to get clarification on the policy for listing closed source commercial portals. I will update you when we hear back. It may take a week or so for my question to be processed. Thank you for your application to the registry and your patience! Cheers Dave Rogers

NoopDog commented 3 years ago

Hi, @mvonpapen I am happy to report that FASTGenomics has been approved for posting. Your page will be on our staging server shortly. I will let you know when it's up on our staging server so you can review it.

Cheers, Dave

NoopDog commented 3 years ago

Hi @mvonpapen your portal page is up here:

https://dev.singlecell.gi.ucsc.edu/analyze/portals/fastgenomics

Can you review and let us know if you would like any updates before we push this live?

Cheers, Dave Rogers

mvonpapen commented 3 years ago

Hi @NoopDog ,

That is great news! Thank you.

I do have some updates/issues:

Thanks again, we are thrilled to be part of the HCA registry!

NoopDog commented 3 years ago

@mvonpapen we have made the requested changes and your page is now live. https://data.humancellatlas.org/analyze/portals/fastgenomics

If you would like any other updates please feel free to submit a PR. You can get started with a PR by selecting the "Improve this Page" link at the bottom of the page and editing the markdown source directly in GitHub.

Cheers, D