bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
988 stars 354 forks source link

RFC: bcbio in clinical sequencing #1169

Closed schelhorn closed 7 years ago

schelhorn commented 8 years ago

I was wondering if anyone is currently using bcbio in a regulated clinical space, i.e. for analyzing patient data. I am especially interested in existing frameworks providing HIPAA/CLIA/ISO 27001 compliance, i.e. systems that wrap bcbio and provide platform security architecture, access control, consistency of results (that can be done within bcbio itself, of course), auditability, availabilty, and mechanisms for tracking provenance to enable withdrawal of patient consent.

Obviously, we are not looking for using bcbio as diagnostic device (the horrors of FDA approval), but rather as a technology to characterize patient populations before clinical trials to augment our existing panels with more diverse genotypic data. I know that Curoverse is going in that direction and iRODS offers technology that aids in acquiring compliance, but none of these frameworks cover all of the issues.

If anyone has experience in these matters regarding bcbio and would be willing to give a pointer or be available for a quick chat I would greatly appreciate that - also after the holidays. Thanks a bunch.

chapmanb commented 8 years ago

Sven-Eric; We're not currently running bcbio this way ourselves, but would love to move towards this goal. From the science side, there is not a fundamental difference between the needs in research and development labs and clinical practice except the time frames over which changes and updates can happen. Both need correctness, built in validation, and paths to stay on specific versions or move forward with new algorithms and approaches.

Practically, I know there are a lot of infrastructure and consistency mechanisms needed for compliance. Our in-progress plan is to move to using CWL for parallelization so we can work on platforms like Arvados that provide that infrastructure. This allows bcbio itself to focus on the biology and validation side of the work, and work with teams like Curoverse on the compliance issues. This is all still a work in progress but definitely something we'll be pushing for in the near future.

I'd love to hear other folks thoughts and experiences as well. Thanks so much for starting this discussion.

matthdsm commented 8 years ago

Hi Brad, Sven-Eric

We are also looking for an automated analysis pipeline for our diagnostic data. Since bcbio offers validated pipelines, this project would fit the bill. And judging by the roadmap set by Brad, bcbio will stay very relevant for us.

There are still some points that I'd like to adress however.

Thanks for al the work you've put in this already! Cheers M

ohofmann commented 8 years ago

Matt,

the LIMS integration has been on the todo list of my group for a while now. @guillermo-carrasco built a basic monitor that keeps an eye on the bcbio log files and reports events, but we'd have to add API calls (or have it reach out to the LIMS system). For the folder monitoring we've been looking at http://arteria-project.github.io/ and similar systems as the glue rather than trying to build this directly into bcbio as requirements probably vary widely between sites -- but that's my take, Brad might have a different view on this.

@brainstorm - when you were running bcbio at SciLife was there ever any integration with the iRods system?

guillermo-carrasco commented 8 years ago

When I was at SciLife, the only integration that we had with iRods was for backups. A tool we developed called TACA took care of moving the result files, FASTQs and everything we wanted to backup to our iRods store on regulare basis. @matthdsm maybe you're looking for something like that for that particular task, which I would personally separate from the analysis, i.e bcbio-nextgen.

This is the monitor @ohofmann pointed, let me know if you want to give it a try and run into any issue :-). As he said, for integration with any LIMS new API call should be added. This shouldn't be very complicated, but depends on how good the API of the target LIMS is.

Hope that helps a bit!

matthdsm commented 8 years ago

Hi all,

Thanks a lot for chiming in! I'll make sure to check each tool and give them a shot! If I run into any issues, I'll create an issue in the appropriate repo, as not to pollute this one to much ;)

Thanks again! M

guillermo-carrasco commented 8 years ago

He @matthdsm :-) TACA is quite SciLifeLab specific, so I'm not sure you can use other than the idea of it. I was doing a lot of developing in there though, so feel free to ask.

chapmanb commented 8 years ago

Matt, Oliver and Guillermo; Thanks for all the discussion. For your specific questions we don't have immediate plans to support iRods or additional LIMs integration. Our current infrastructure plan is supporting the common workflow language (CWL: http://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html) so that we can run bcbio inside platforms that provide this type of functionality. Since bcbio will have a generalized representation in CWL, it can then integrate with other infrastructure so we don't have to build out everything as part of bcbio.

The reasoning behind this is that we're not realistically going to have enough time to do all this work ourselves, so can take advantage of the strength of other tools while continuing to build out what bcbio does well. This will let us focus more of validations, tool integration and tackling harder problems with structural variation and cancer tumor heterogeneity.

Hope this provides some background behind our current work, and still makes bcbio useful for the work you're doing.

matthdsm commented 8 years ago

Hi Brad,

I'm replying to this post to get some follow up on the status of work done in bcbio regarding CWL. We currently have (almost) have bcbio up and running in our genetics context and are already looking to the future. We'd like to migrate from a regular cluster to a mesos cluster running bcbio in docker as part of bigger workflow.

Thanks M

chapmanb commented 8 years ago

Thanks for following up. We're still actively working on CWL and following the path outlined in my last post. We are actively working on getting bcbio running with Toil, which supports CWL and has mesos integration:

http://toil.readthedocs.io/en/latest/

I have not personally worked with a Mesos cluster so don't have practical experience to share but the workflow of using bcbio CWL with Toil on it should hopefully support the type of system you're building out. Hope this helps.

schelhorn commented 7 years ago

So, as a late update on this topic: we have re-implemented the relevant bcbio workflows on a commercial GxP-compliant, validated (CSV) sequence analysis platform that sits on top of iRODS and provides role-based access, SDTM export, chain-of-custody, audit logs and provenance tracking. I am happy to provide more info on personal contact.

mjafin commented 7 years ago

Nice work

chapmanb commented 7 years ago

Sven-Eric -- wow, congrats that is great news. I'd definitely love to hear more details if you're able to share either publicly or via e-mail. Out of curiousity, how much of bcbio were you able to re-use? Did you have to re-implement entirely or are you able to use bcbio within your chosen platform? As you know one of our current goals is to make bcbio more portable to multiple platforms, so it would be useful to hear your practical experience with doing that right now. Thanks much for sharing and congrats again on all the hard work that must have gone into putting this together.

ohofmann commented 7 years ago

Curious as well. Happy to take this to email or a video chat. Auditing will start for as in Q1/Q2 2017.

matthdsm commented 7 years ago

Hi @schelhorn

Very happy to hear that, we'd be very interested to hear more about your experiences, as we're trying to accomplish the exact same thing! How do you prefer we contact you?

M

schelhorn commented 7 years ago

Sure, I'd be happy to share our architecture and learnings in an academic/precompetitive setting. I'd suggest that whoever is interested in such a TC may contact me on LinkedIn and there send me an email address and time zone. I'll try to find a slot end of January/beginning of February and we can do a Skype call (preferred) or Hangout/Webex then.

matthdsm commented 7 years ago

Hi @schelhorn

Since I don't have a premium account, I'm not able to send you any "InMail". I did however send you an invite to connect. We're very interested to have a talk with you. You can find all contact info on my GitHub profile.

Thanks, M

schelhorn commented 7 years ago

Sure, feel free to send me a contact request.