NationalGenomicsInfrastructure / piper

A genomics pipeline build on top of the GATK Queue framework
9 stars 9 forks source link

sthlm2UUSNP and four-tier directory format #17

Closed mariogiov closed 10 years ago

mariogiov commented 10 years ago

While we're on a wild changing spree, we're suggesting changing the directory structure to be more granular and include more information. It would look like this:

Project
└── Sample
    └── Library Prep
        └── Sequencing Run
            ├── fastq_R1.fastq.gz
            └── fastq_R2.fastq.gz

For example:

P.Mayhem_19_99
└── P1142_101
    └── A
        └── 140528_BC423WACXX
            ├── P1142_101_NoIndex_L002_R1_001.fastq.gz
            └── P1142_101_NoIndex_L002_R2_001.fastq.gz

This provides more information and more closely tracks the database structure. I haven't come up with any disadvantages yet but let me know if you see this as a problem!

Mario

johandahlberg commented 10 years ago

Sure thing. Will the "library prep" part just contain dummy info for now, or will that actually hold info on the library sequenced? Either way I can implement these changes easily if you want to have it in sthlm2UUSNP.

vezzi commented 10 years ago

The library prep part will not contain proper info on the library type but only info on the library number.In the example that mario showed, if there is a second library for the same sample we will have a "B" folder.

The information about the lbrary type will be in the DB.

@mariogiov : what about libraries without sequencing?

mariogiov commented 10 years ago

@vezzi libraries without sequencing will not have a folder as the folders are created when the files are being copied from the flowcell. Does this seem like a problem?

vezzi commented 10 years ago

Yep but the libraries without sequencing runs will be in the DB, so at some point we might have problems due to wrong assumptions (i.e., if I have a library then I assume I have a folder somewhere....)

mariogiov commented 10 years ago

Aha I see. Well so far I'm only using the database as a reference to get/store information about files and folders on Nestor, so no troubles yet, but that's something to watch out for in the future.

mariogiov commented 10 years ago

@johandahlberg It would be awesome if you could implement this in sthlm2UUSNP as at the moment I can't test the changes I'm making

johandahlberg commented 10 years ago

I'll get on it asap (hopefully this afternoon).

johandahlberg commented 10 years ago

Sorry @mariogiov, but I don't think I'll be able to finish this today, so you'll have to wait for a fix till Monday.

vezzi commented 10 years ago

not a big deal I am planning to work on this not before tuesday

F.

On 18 Jul 2014, at 14:42, Johan Dahlberg notifications@github.com wrote:

Sorry @mariogiov, but I don't think I'll be able to finish this today, so you'll have to wait for a fix till Monday.

— Reply to this email directly or view it on GitHub.

johandahlberg commented 10 years ago

I just pushed changes that hopefully will deal with this. Test it in your env and see that everything works.

vezzi commented 10 years ago

GReat, I hope to have some time this afternoon to test this, I am working on a different project for a couple of days. I will be back to the pipeline Wednesday afternoon at latests

vezzi commented 10 years ago

took some time but I finally tested it and it works fine!!!