ctSkennerton / crass

The CRISPR assembler
http://ctskennerton.github.io/crass
GNU General Public License v3.0
35 stars 11 forks source link

Multiple .gv files per group? #99

Open MeggyC opened 4 years ago

MeggyC commented 4 years ago

Hi there,

This is not so much an issue as it is a question. When I run Crass I end up with multiple .gv files for one group/array (if I have read the documentation correctly) - am I correct in thinking that Spacers_6 is all the spacers in array group 6, with the nucleotide sequence in the title being the DR sequence?

This is what the file list looks like (for Spacers_6):

Spacers_6_CGGTTCATCCCCACGCCTGTGGGGAACAC_spacers.gv Spacers_6_CGGTTCATCCCTGCAGGCGCAGGGAACAC_spacers.gv Spacers_6_CGGTTTATCCCCACACCTGTGGGGAACAC_spacers.gv Spacers_6_GTTGTGAATTCCTTACAATTTTTTATATTTGCGCGTGAATCACAAC_spacers.gv

Does this mean that an array may have the spacers shown in any of these groups interspaced by a combination of different spacers?

Thanks!

ctSkennerton commented 4 years ago

You are correct that Spacers_6 should be the spacers from group 6 and the sequence should be the DR sequence. It's been a long time since I've developed this code so I'm not sure if the multiple sequences should be expected - my gut feeling is that the DR sequence should be unique to each group. I'll have to read through the code again to check on this. You ran crass once and got all of these files? Just want to make sure it's not something obvious like you've been running it a bunch of times with different parameters or files with the same output directory.

MeggyC commented 4 years ago

Hey there,

I've only run Crass once but it was done in parallel on a group of reads, i.e.

ls $READDIR/* | parallel -j8 crass {} There is just one crass.crispr output file - and 15 .log files - there are 22 read files that were parsed into parallel

I don't know if that's helpful at all.

ctSkennerton commented 4 years ago

I think you've actually run crass multiple times, once for each of the input files in $READDIR. If you are intending to run crass once on all of the files in the directory at the same time you want to do something like crass $READDIR/*

MeggyC commented 4 years ago

Thanks Connor - that's very helpful - just one other question - do you think it's preferable to co-assemble the samples using the command above (crass $READDIR/*)? Or would you run Crass once on each read file?

ctSkennerton commented 4 years ago

I think if the files are all from one sample ⏤ like R1 and R2 files from Illumina generated data ⏤ then they should be run together. If they are from different samples, then it depends on your biological question. If you're interested in the differences between samples then it might make more sense to run them separately and compare.