Run MiSeq pipeline under Shipyard

donkirkby commented 10 years ago

This task is to check out the shipyard branch of the MiSeq pipeline project, set up all the configuration, and run the pipeline on a developer workstation for an example FASTQ file.

donkirkby commented 10 years ago

Configuration Steps

Since I'm probably going to do this many times, I will record the configuration steps here. They're based on the instructions in INSTALL.md.

Create a raw data set.

cd ~/git/Shipyard/shipyard
python manage.py shell
from librarian.models import SymbolicDataset
from django.contrib.auth.models import User
u = User.objects.get(username='shipyard')
SymbolicDataset.create_SD('/mnt/data/don/data/RAW_DATA/140522/46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23_L001_R1_001.fastq', user=u, name='46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23_L001_R1_001.fastq', description='example FASTQ, forward read')
SymbolicDataset.create_SD('/mnt/data/don/data/RAW_DATA/140522/46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23_L001_R2_001.fastq', user=u, name='46824A-3515-HLA-B-E99601CLIMX-PR-RT_S23_L001_R2_001.fastq', description='example FASTQ, reverse read')
SymbolicDataset.create_SD('/home/don/git/MiseqPipeline/reference_sequences/cfe.fasta', user=u, name='cfe.fasta', description='initial set of references in FASTA format')
SymbolicDataset.create_SD('/home/don/git/MiseqPipeline/reference_sequences/csf2counts_amino_refseqs.csv', user=u, name='csf2counts_amino_refseqs.csv', description='amino acid reference sequences')
exit()

Go to the Shipyard web interface, and navigate to Developer portal.
Add code resources for the following files: settings.py, hyphyAlign.py, prelim_map.py, remap.py, sam2csf.py, csf2counts.py. Copy the description from each script's help text, and add a dependency on settings.py to all the other code resources, except hyphyAlign.py. csf2counts.py also needs a dependency on hyphyAlign.py. Leave each dependency's path blank, and use a filename of settings.py or hyphyAlign.py.
Add a method for each code resource that you just created. Look at each script's help text to see the list of inputs and outputs. Choose unstructured datatype for each input and output.
Add a new pipeline.
Add all the inputs and methods, then wire them up. For now, you have to look at each script's help text to see the names of the inputs and outputs, then figure out which ones connect.
Type a name and description for the pipeline, then click Submit.
Don't worry if there is no response, just go back to the list of pipelines and check that yours appears.
Navigate up to the home page, and then down to Users portal: Analysis.
Select your pipeline in the middle section, and then your inputs in the left section.
Click the Run button.

donkirkby commented 10 years ago

I successfully ran the MiSeq pipeline under Shipyard for a single sample. Woot! I had to hack around a couple of bugs to make it work, so I will record those as issues tomorrow before closing this issue.

ArtPoon commented 10 years ago

Sweet!

On Jun 17, 2014, at 5:05 PM, Don Kirkby wrote:

I successfully ran the MiSeq pipeline under Shipyard for a single sample. Woot! I had to hack around a couple of bugs to make it work, so I will record those as issues tomorrow before closing this issue.

— Reply to this email directly or view it on GitHub.

donkirkby commented 10 years ago

The outstanding issues are #125 and #126, but this issue can close.

cfe-lab / Kive

Run MiSeq pipeline under Shipyard #111

Configuration Steps