NationalGenomicsInfrastructure / piper

A genomics pipeline build on top of the GATK Queue framework
9 stars 9 forks source link

SetupFileCreator problem #18

Closed vezzi closed 9 years ago

vezzi commented 10 years ago

I was giving a look to the setup.xml file created automatically by SetupFileCreator and found several strange things

THis is the folder structure:

tree
.
|-- 130611_AH0CCVADXX
|   |-- report.tsv
|   `-- Sample_P567_101
|       |-- P567_101_NoIndex_L001_R1_001.fastq.gz
|       |-- P567_101_NoIndex_L001_R2_001.fastq.gz
|       |-- P567_101_NoIndex_L002_R1_001.fastq.gz
|       `-- P567_101_NoIndex_L002_R2_001.fastq.gz
|-- 130612_AH056WADXX
|   |-- report.tsv
|   `-- Sample_P567_101
|       |-- P567_101_GCCAAT_L001_R1_001.fastq.gz
|       |-- P567_101_GCCAAT_L001_R2_001.fastq.gz
|       |-- P567_101_GCCAAT_L002_R1_001.fastq.gz
|       `-- P567_101_GCCAAT_L002_R2_001.fastq.gz
|-- 130627_AH0JYUADXX
|   |-- report.tsv
|   `-- Sample_P567_102
|       |-- P567_102_TGACCA_L001_R1_001.fastq.gz
|       |-- P567_102_TGACCA_L001_R2_001.fastq.gz
|       |-- P567_102_TGACCA_L002_R1_001.fastq.gz
|       `-- P567_102_TGACCA_L002_R2_001.fastq.gz
|-- 130701_AH0J92ADXX
|   |-- report.tsv
|   `-- Sample_P567_102
|       |-- P567_102_TGACCA_L001_R1_001.fastq.gz
|       |-- P567_102_TGACCA_L001_R2_001.fastq.gz
|       |-- P567_102_TGACCA_L002_R1_001.fastq.gz
|       `-- P567_102_TGACCA_L002_R2_001.fastq.gz
|-- 130701_BH0JMGADXX
|   |-- report.tsv
|   `-- Sample_P567_102
|       |-- P567_102_TGACCA_L001_R1_001.fastq.gz
|       |-- P567_102_TGACCA_L001_R2_001.fastq.gz
|       |-- P567_102_TGACCA_L002_R1_001.fastq.gz
|       `-- P567_102_TGACCA_L002_R2_001.fastq.gz
`-- A.Wedell_13_03_UUSNP_setup.xml

This is the command line used to produce A.Wedell_13_03_UUSNP_setup.xml

setupFileCreator --output /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/A.Wedell_13_03_UUSNP_setup.xml --project_name A.Wedell_13_03_UUSNP --sequencing_platform Illumina --sequencing_center NGI --uppnex_project_id a2010002 --reference /proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130611_AH0CCVADXX/Sample_P567_101 --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130612_AH056WADXX/Sample_P567_101 --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/Sample_P567_102 --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102 --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102

I attach the file file at the bottom of the mail. As expected there are 5 run folders, but there are 15 sample folders, strangely enough the first run folder as 5 samples folder, the second 4, the third 3... and so on.... it looks like there is some buf in setupFileCreator or in the way in which we specify the folders to the script.

As a further example of what I mean consider that the line

<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>

is repeated in all the 5 samples runs, while, if I have understand it correctly, it should be present only in the last ... section

This is the xml file that is created, it looks like I cannot attach it to github issue:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project xmlns="setup.xml.molmed">
    <metadata>
        <name>A.Wedell_13_03_UUSNP</name>
        <sequenceingcenter>NGI</sequenceingcenter>
        <platfrom>Illumina</platfrom>
        <uppmaxprojectid>a2010002</uppmaxprojectid>
        <uppmaxqos></uppmaxqos>
    </metadata>
    <inputs>
        <runfolder>
            <report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130611_AH0CCVADXX/report.tsv</report>
            <samplefolder>
                <name>P567_101</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130611_AH0CCVADXX/Sample_P567_101</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_101</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130612_AH056WADXX/Sample_P567_101</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
        <runfolder>
            <report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130612_AH056WADXX/report.tsv</report>
            <samplefolder>
                <name>P567_101</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130612_AH056WADXX/Sample_P567_101</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
        <runfolder>
            <report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/report.tsv</report>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
        <runfolder>
            <report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/report.tsv</report>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
        <runfolder>
            <report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/report.tsv</report>
            <samplefolder>
                <name>P567_102</name>
                <path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
    </inputs>
</project>
vezzi commented 10 years ago

Probably same problem but I point it out anyway: Related to issue #20 : I manually created a UUSNPSEQ dir for project M.Kaller_14_06 (I need to analyse it independently)

tree -L 2
.
|-- 140702_AC41A2ANXX
|   |-- report.tsv
|   |-- Sample_P1171_102
|       |-- P1171_102_ATTCAGAA-CCTATCCT_L001_R1_001.fastq.gz
|       |-- P1171_102_ATTCAGAA-CCTATCCT_L001_R2_001.fastq.gz
......
|   |-- Sample_P1171_104
|   |-- Sample_P1171_106
|   `-- Sample_P1171_108
`-- pipelineSetup.xml

and I executed the command:

setupFileCreator  -o pipelineSetup.xml -p M.Kaller_14_06 -s  Illumina -c NGI -a a2010002 -i /proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_102/  -i /proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_104   -i /proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_106  -i /proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108 -r /proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta

the tsv file is the following:

 cat 140702_AC41A2ANXX/report.tsv 
#SampleName     Lane    ReadLibrary     FlowcellId
P1171_102       1       A       AC41A2ANXX
P1171_102       2       A       AC41A2ANXX
P1171_102       3       A       AC41A2ANXX
P1171_102       4       A       AC41A2ANXX
P1171_102       5       A       AC41A2ANXX
P1171_102       6       A       AC41A2ANXX
P1171_102       7       A       AC41A2ANXX
P1171_102       8       A       AC41A2ANXX
P1171_104       1       A       AC41A2ANXX
P1171_104       2       A       AC41A2ANXX
P1171_104       3       A       AC41A2ANXX
P1171_104       4       A       AC41A2ANXX
P1171_104       5       A       AC41A2ANXX
P1171_104       6       A       AC41A2ANXX
P1171_104       7       A       AC41A2ANXX
P1171_104       8       A       AC41A2ANXX
P1171_106       1       A       AC41A2ANXX
P1171_106       2       A       AC41A2ANXX
P1171_106       3       A       AC41A2ANXX
P1171_106       4       A       AC41A2ANXX
P1171_106       5       A       AC41A2ANXX
P1171_106       6       A       AC41A2ANXX
P1171_106       7       A       AC41A2ANXX
P1171_106       8       A       AC41A2ANXX
P1171_108       1       A       AC41A2ANXX
P1171_108       2       A       AC41A2ANXX
P1171_108       3       A       AC41A2ANXX
P1171_108       4       A       AC41A2ANXX
P1171_108       5       A       AC41A2ANXX
P1171_108       6       A       AC41A2ANXX
P1171_108       7       A       AC41A2ANXX
P1171_108       8       A       AC41A2ANXX

and the resulting xml file is the one copied at the end of this comment

what makes me suspicious is that there are 4 runfolders the first with 4 sameple folders the second with three sample folders, etc....

@johandahlberg I suppose that only the first run folder entity is the correct one, the other three are not supposed to be there right?

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project xmlns="setup.xml.molmed">
    <metadata>
        <name>M.Kaller_14_06</name>
        <sequenceingcenter>NGI</sequenceingcenter>
        <platfrom>Illumina</platfrom>
        <uppmaxprojectid>a2010002</uppmaxprojectid>
        <uppmaxqos></uppmaxqos>
    </metadata>
    <inputs>
        <runfolder>
            <report>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/report.tsv</report>
            <samplefolder>
                <name>P1171_102</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_102</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P1171_104</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_104</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P1171_106</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_106</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P1171_108</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
        <runfolder>
            <report>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/report.tsv</report>
            <samplefolder>
                <name>P1171_104</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_104</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P1171_106</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_106</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P1171_108</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
        <runfolder>
            <report>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/report.tsv</report>
            <samplefolder>
                <name>P1171_106</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_106</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
            <samplefolder>
                <name>P1171_108</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
        <runfolder>
<runfolder>
            <report>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/report.tsv</report>
            <samplefolder>
                <name>P1171_108</name>
                <path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108</path>
                <reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
    </inputs>
</project>
johandahlberg commented 10 years ago

This should be fixed in: 52a3b702803ca4272caf7f4a817f909cebfd1e2b. I'll fix some more problems and then push a new release for testing.

johandahlberg commented 10 years ago

@vezzi Check out the latest release and test this and see if that didn't solve your problem.

johandahlberg commented 9 years ago

@vezzi I think that this is fixed now, would you like to confirm that?

vezzi commented 9 years ago

yep I close it