d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

D3b Bix Dev Feature: Create Docker/Cavatica app for methylation array processing #291

Closed jharenza closed 2 years ago

jharenza commented 2 years ago

[Required] Is this a new tool/workflow?

Yes

[Required] Which tools/workflows would you like to update/add?

Provide software links and dockerPull location if applicable

Methylation processing: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/158 This has currently been run on isilon by @ewafula who can provide more details. There is not yet a dockerfile.

[Required] What features (new params, inputs, outputs, etc) would you like to add to each?

Add reference files and test input locations if applicable

Please create a dockerfile and cavatica app for this for future samples to be run on cavatica and as part of the bix toolkit

[Optional] How long do you think this work will take?

1 week

[Optional] Who will complete this work?

@dmiller15 with input from @ewafula and @afarrel

Ticket will be considered resolved either by a successful PR with updated release if applicable, or a simple denial of the request backed by a good reason

dmiller15 commented 2 years ago

@ewafula and @afarrel, for the Dockerfile I'll need a breakdown of environmental requirements to run these tools.

For the "cavatica app", do we just want tools or do we need a whole workflow? I noticed there's a bash script in the attached PR; is the hope to transform that into a workflow?

ewafula commented 2 years ago

@dmiller15,

Here is the updated PR that includes the CBTN 850k arrays: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/169

The relevant analysis script is 01-preprocess-illumina-arrays.R

The wrapper script, run-preprocess-illumina-arrays.sh is only needed if the analysis is performed a certain way which might not be the case on Cavatica as described in the README

The module is designed to run within a local OpenPedCan-analysis local repo; meaning that there are path settings that will require the repo to be cloned locally for successful execution. I am sure I'll need to tweak the code a little bit to conform to how things need to be on CAVATICA.

Let's set up at least a single meeting to hash out all details. I think will make progress faster than exchanging messages here.

Cc @afarrel

dmiller15 commented 2 years ago

That sounds good. Let me know when you'd like to do the meeting.

ewafula commented 2 years ago

@dmiller15, does 2:30Pm today work for you?

dmiller15 commented 2 years ago

I'm busy today and tomorrow in the afternoon. I am free tomorrow morning, all day Monday, and Tuesday morning.

ewafula commented 2 years ago

Let's meet tomorrow morning at 10am.

ewafula commented 2 years ago

@dmiller15, I have amended the methylation preprocessing code to remove all OpenPedCan repo module dependencies and should now work on any machine with the required packages installed. There is a README file in the compressed folder that describes all the files, including the 1) TARGET/CBTN test array datasets and corresponding manifests, 2) array preprocessing R script, 3) Perl script for creating smaller batches, and 4) set by step instructions on how to execute the two scripts using both the complete test datasets and batches.

@afarrel, would you mind allowing @dmiller15 access to retrieve the following gzipped file from Isilon for wrapping and testing the methylation array preprocessing script in CAVATICA: /mnt/isilon/opentargets/Methylation/cavatica_methylation_app.tar.gz

afarrel commented 2 years ago

@chinwallaa can you grant @dmiller15 access to the opentargets drive on Isilon?

chinwallaa commented 2 years ago

Yes, putting in the request -

jharenza commented 2 years ago

if @yuankunzhu or @zhangb1 already have access, or @ewafula or @afarrel - can one of you upload the app to a CAVATICA project for @dmiller15 ?

dmiller15 commented 2 years ago

I have been given access, allegedly, to Isilon but the directions that the Service Center provided do not seem to be working for me.

jharenza commented 2 years ago

@ewafula can you help

chinwallaa commented 2 years ago

@dmiller15 can you send me the instructions they sent you for access - (should be instructions for accessing both via SMB and NFS)

ewafula commented 2 years ago

@jharenza, had a chat on slack with @dmiller15 and already contacted @chinwallaa and @afarrel for help.

migbro commented 2 years ago

Alternatively, to echo @jharenza , if one already has access to isilon, it is quite easy for a user to upload a file to a project. Bascially:

  1. get seven bridges CLI: https://docs.sevenbridges.com/docs/command-line-interface
  2. Create a credentials file in your home dir if you haven't already: https://docs.sevenbridges.com/docs/store-credentials-to-access-seven-bridges-client-applications-and-libraries (be sure the endpoint is https://cavatica-api.sbgenomics.com/v2)
  3. upload with sb upload --file <name of file> --project <user/project-name>

@dmiller15 would be able to specify the --project param if and when the cavatica project is created/determined

migbro commented 2 years ago

Even better, I just realized I have access to isilon myself, and miraculously can access .However , @chinwallaa , need to be added to opentargets group to access the directory

chinwallaa commented 2 years ago

@migbro just requested access for you to the /mnt/isilon/opentargets share. Miguel what server (HPC and/or VM) do you login into, to access isilon shares via NFS.

migbro commented 2 years ago

I use the VMware Horizon Client, login to theCHOP-EDU domain with my CHOP username and password. I don't remember how, but I connect to the beyond.chop.edu endpoint, to reach RES-RHEL-HPC

That will give me a virtual desktop with /mnt/isilon accessible in that vm interface using the terminal app within that vm.

I had tried using smb://ressmb03.research.chop.edu, but the opentargets folder does not show there.

chinwallaa commented 2 years ago

the opentargets share is set up for NFS protocol (so need to access via a linux VM/server within CHOP firewall). Will follow up with RIS to add SMB to the share.

migbro commented 2 years ago

Group access has come through. Cavatica project created here: https://cavatica.sbgenomics.com/u/d3b-bixu/ot-methylation-dev/files/#q, test tar ball has also been uploaded. @dmiller15 is aware and can take it from here

dmiller15 commented 2 years ago

Ok I've got the files. I'll start working up the CWL now but which github repo should the CWL from this task go into?

dmiller15 commented 2 years ago

Ok the CWL has been written and I've got a couple of test runs done here: CBTN: https://cavatica.sbgenomics.com/u/d3b-bixu/ot-methylation-dev/tasks/325f9e65-deb6-4857-ad86-55df8f356f49/ TARGET: https://cavatica.sbgenomics.com/u/d3b-bixu/ot-methylation-dev/tasks/f10aefd2-a363-464a-9e1f-c0adc7b7b6b6/

Let me know where you'd like the code, and I'll push it there to complete this task.

ewafula commented 2 years ago

Thank you, @dmiller15 for working on this! @jharenza, @migbro, which repo does the CAVATICA is app code get pushed? I am not sure where the other OT CAVATA apps code are hosted i.e., the CAVATA app for DESeq2 module that @sangeetashukla and @ferrel work on.

migbro commented 2 years ago

If you want to follow what was done for DESeq2, it was placed in analyses: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/dev/analyses/tumor-normal-differential-expression So @ewafula , please advise where the current methylation tool was placed, and @dmiller15 can add the folder structure as was done here (adding a tools and workflows dir). Also, @sangeetashukla had created this in the README: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/dev/analyses/tumor-normal-differential-expression#cavatica, so in the README for that methylation analysis dir, I'd recommend giving similar advice. Dan may need assistance in filling in details, like where can one normally get these input files to follow suit. Does that make sense?

jharenza commented 2 years ago

One other thing to note is that we probably want to add this to our somatic workflow for KF as well (or the toolkit?). Eg - we have more CBTN methylation incoming and will want to process via this method.

migbro commented 2 years ago

Well, probably not the prod somatic workflow. That workflow operates on sequencing read data, I think this is array data. As far as I know, this is kind of atypical, unless there are plans to do methylation arrays on all KF studies from now on. Perhaps as it's own wf...

ewafula commented 2 years ago

@migbro, @dmiller15 the current OpenPedCan methylation-processing module on the CAVATICA workflow app code is based on is still unapproved PR (see link below). @afarrel needs to approve and merge it, then @dmiller15 can follow the process that was done with DESeq2 as you explained above and integrate it into a module subfolder. https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/169

jharenza commented 2 years ago

@migbro gotcha - yeah probably as its own then. I think we are getting more arrays from NCI via the P30s so maybe this becomes a part of @zhangb1 's run of the toolkit for these CCDI initiatives?

dmiller15 commented 2 years ago

I went ahead and made a PR: https://github.com/ewafula/OpenPedCan-analysis/pull/1. It's in @ewafula's repository as that's where the base branch is located. Can change later it doesn't automatically change after the merge.

dmiller15 commented 2 years ago

The associated PR has been merged into @ewafula branch which stands to be merged into OpenTargets. This task is complete.