Transipedia / dekupl-run

Identify differentially expressed k-mers between RNA-Seq datasets
MIT License
11 stars 11 forks source link

Fastq file nomenclature #55

Closed mndavies286 closed 5 years ago

mndavies286 commented 5 years ago

Hi

I'm trying to run the docker image of dekupl and am currently getting an error with the name of the Fastq files. The error is:

SystemExit in line 191 of /dekupl/Snakefile: Invalid sample name 'SRR7702228'. Sample names must start with at least one letter and then only letters, numbers and underscore characters are allowed File "/dekupl/Snakefile", line 191, in

The command is:

docker run --rm -v ${PWD}/my-config.json:/dekupl/my-config.json -v ${PWD}/data:/dekupl/data -v ${PWD}/results:/dekupl/results transipedia/dekupl-run --configfile my-config.json -j8 --resources ram=10 -p

And the files are paired fastqs and placed in /data

SRR7702228_1.fastq.gz SRR7702228_2.fastq.gz etc.

My config file is include below. As the file id is only letters and numbers and starts with a letter, I'm not quite sure what it's objecting to. Could you tell me how to fix the config file?

BW

Matt

{ "fastq_dir": "data",

"dekupl_counter": { "min_recurrence": 2, "min_recurrence_abundance": 5 },

"diff_analysis": { "condition" : { "A": "A", "B": "B" }, "pvalue_threshold": 0.05, "log2fc_threshold": 2 },

"samples": [{ "name": "SRR7702228", "condition": "A" }, { "name" : "SRR7702229", "condition" : "A" }, { "name" : "SRR7702240", "condition" : "B" }, { "name" : "SRR7702241", "condition" : "B" } ] }

mndavies286 commented 5 years ago

Interestingly, the problem seems to go away if you label the samples 'normal1', 'normal2' etc.

jaudoux commented 5 years ago

Hi @mndavies286 ,

We introduced a test on sample naming because the names are later used in R (by dekupl-viewer) as columns names of dataframes and R is a bit touchy about characters used for column names.

The regex that does the matching is the following ^[a-zA-Z][1-9a-zA-Z_]*$

However, you can see that I forgot to enable zeros (0) in the allowed characters. I will patch this and docker image will be rebuild shortly after. Until then you can try with sample names without the '0'.

Thanks for spotting this issue.

Best, Jérôme.

mndavies286 commented 5 years ago

Hi Jérôme.

Ah I see, I thought it be either an issue with '0' or upper case letters. Thank you for responding so quickly.

Best wishes

Matt

jaudoux commented 5 years ago

Hi Matt,

It's just to let you know that the docker image has been updated on the dockuper hub : https://hub.docker.com/r/transipedia/dekupl-run/tags

Best, J.

mndavies286 commented 5 years ago

That's great, thank you.

One quick question, I'm not very familiar with the JSON format. If I add additional samples for conditions A and B, do I need to adjust the "diff_analysis" section to reflect this. Or does it automatically compare all A samples against all B samples?

jaudoux commented 5 years ago

Hi Matt,

No need to update the "diff_analysis". If you add samples in the array of samples with condition "A" or "B" it will work. Also you can changes the labels of the conditions, "A" and "B" is just for the toy config file.

J.