Closed jharenza closed 2 years ago
@ewafula and @afarrel, for the Dockerfile I'll need a breakdown of environmental requirements to run these tools.
For the "cavatica app", do we just want tools or do we need a whole workflow? I noticed there's a bash script in the attached PR; is the hope to transform that into a workflow?
@dmiller15,
Here is the updated PR that includes the CBTN 850k arrays
: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/169
The relevant analysis script is 01-preprocess-illumina-arrays.R
The wrapper script, run-preprocess-illumina-arrays.sh is only needed if the analysis is performed a certain way which might not be the case on Cavatica as described in the README
The module is designed to run within a local OpenPedCan-analysis local
repo; meaning that there are path settings that will require the repo to be cloned locally for successful execution. I am sure I'll need to tweak the code a little bit to conform to how things need to be on CAVATICA
.
Let's set up at least a single meeting to hash out all details. I think will make progress faster than exchanging messages here.
Cc @afarrel
That sounds good. Let me know when you'd like to do the meeting.
@dmiller15, does 2:30Pm today work for you?
I'm busy today and tomorrow in the afternoon. I am free tomorrow morning, all day Monday, and Tuesday morning.
Let's meet tomorrow morning at 10am.
@dmiller15, I have amended the methylation preprocessing code to remove all OpenPedCan repo module dependencies and should now work on any machine with the required packages installed. There is a README file in the compressed folder that describes all the files, including the 1) TARGET/CBTN test array datasets and corresponding manifests
, 2) array preprocessing R script
, 3) Perl script for creating smaller batches
, and 4) set by step instructions on how to execute the two scripts using both the complete test datasets and batches
.
@afarrel, would you mind allowing @dmiller15 access to retrieve the following gzipped file from Isilon for wrapping and testing the methylation array preprocessing script in CAVATICA:
/mnt/isilon/opentargets/Methylation/cavatica_methylation_app.tar.gz
@chinwallaa can you grant @dmiller15 access to the opentargets drive on Isilon?
Yes, putting in the request -
if @yuankunzhu or @zhangb1 already have access, or @ewafula or @afarrel - can one of you upload the app to a CAVATICA project for @dmiller15 ?
I have been given access, allegedly, to Isilon but the directions that the Service Center provided do not seem to be working for me.
@ewafula can you help
@dmiller15 can you send me the instructions they sent you for access - (should be instructions for accessing both via SMB and NFS)
@jharenza, had a chat on slack with @dmiller15 and already contacted @chinwallaa and @afarrel for help.
Alternatively, to echo @jharenza , if one already has access to isilon, it is quite easy for a user to upload a file to a project. Bascially:
https://cavatica-api.sbgenomics.com/v2
)sb upload --file <name of file> --project <user/project-name>
@dmiller15 would be able to specify the --project
param if and when the cavatica project is created/determined
Even better, I just realized I have access to isilon myself, and miraculously can access .However , @chinwallaa , need to be added to opentargets
group to access the directory
@migbro just requested access for you to the /mnt/isilon/opentargets share. Miguel what server (HPC and/or VM) do you login into, to access isilon shares via NFS.
I use the VMware Horizon Client, login to theCHOP-EDU
domain with my CHOP username and password.
I don't remember how, but I connect to the beyond.chop.edu endpoint, to reach RES-RHEL-HPC
That will give me a virtual desktop with /mnt/isilon
accessible in that vm interface using the terminal app within that vm.
I had tried using smb://ressmb03.research.chop.edu
, but the opentargets
folder does not show there.
the opentargets share is set up for NFS protocol (so need to access via a linux VM/server within CHOP firewall). Will follow up with RIS to add SMB to the share.
Group access has come through. Cavatica project created here: https://cavatica.sbgenomics.com/u/d3b-bixu/ot-methylation-dev/files/#q, test tar ball has also been uploaded. @dmiller15 is aware and can take it from here
Ok I've got the files. I'll start working up the CWL now but which github repo should the CWL from this task go into?
Ok the CWL has been written and I've got a couple of test runs done here: CBTN: https://cavatica.sbgenomics.com/u/d3b-bixu/ot-methylation-dev/tasks/325f9e65-deb6-4857-ad86-55df8f356f49/ TARGET: https://cavatica.sbgenomics.com/u/d3b-bixu/ot-methylation-dev/tasks/f10aefd2-a363-464a-9e1f-c0adc7b7b6b6/
Let me know where you'd like the code, and I'll push it there to complete this task.
Thank you, @dmiller15 for working on this! @jharenza, @migbro, which repo does the CAVATICA is app code get pushed? I am not sure where the other OT CAVATA apps code are hosted i.e., the CAVATA app for DESeq2 module that @sangeetashukla and @ferrel work on.
If you want to follow what was done for DESeq2, it was placed in analyses:
https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/dev/analyses/tumor-normal-differential-expression
So @ewafula , please advise where the current methylation tool was placed, and @dmiller15 can add the folder structure as was done here (adding a tools
and workflows
dir). Also, @sangeetashukla had created this in the README: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/dev/analyses/tumor-normal-differential-expression#cavatica, so in the README for that methylation analysis dir, I'd recommend giving similar advice. Dan may need assistance in filling in details, like where can one normally get these input files to follow suit. Does that make sense?
One other thing to note is that we probably want to add this to our somatic workflow for KF as well (or the toolkit?). Eg - we have more CBTN methylation incoming and will want to process via this method.
Well, probably not the prod somatic workflow. That workflow operates on sequencing read data, I think this is array data. As far as I know, this is kind of atypical, unless there are plans to do methylation arrays on all KF studies from now on. Perhaps as it's own wf...
@migbro, @dmiller15 the current OpenPedCan methylation-processing
module on the CAVATICA workflow app code is based on is still unapproved PR (see link below). @afarrel needs to approve and merge it, then @dmiller15 can follow the process that was done with DESeq2 as you explained above and integrate it into a module subfolder.
https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/169
@migbro gotcha - yeah probably as its own then. I think we are getting more arrays from NCI via the P30s so maybe this becomes a part of @zhangb1 's run of the toolkit for these CCDI initiatives?
I went ahead and made a PR: https://github.com/ewafula/OpenPedCan-analysis/pull/1. It's in @ewafula's repository as that's where the base branch is located. Can change later it doesn't automatically change after the merge.
The associated PR has been merged into @ewafula branch which stands to be merged into OpenTargets. This task is complete.
✨
[Required] Is this a new tool/workflow?
Yes
[Required] Which tools/workflows would you like to update/add?
Provide software links and
dockerPull
location if applicableMethylation processing: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/pull/158 This has currently been run on isilon by @ewafula who can provide more details. There is not yet a dockerfile.
[Required] What features (new params, inputs, outputs, etc) would you like to add to each?
Add reference files and test input locations if applicable
Please create a dockerfile and cavatica app for this for future samples to be run on cavatica and as part of the bix toolkit
[Optional] How long do you think this work will take?
1 week
[Optional] Who will complete this work?
@dmiller15 with input from @ewafula and @afarrel
Ticket will be considered resolved either by a successful PR with updated release if applicable, or a simple denial of the request backed by a good reason