Closed LisaHollstein closed 2 years ago
Thanks Lisa, it's probably not memory, but can you try again after editing the following in the growth_rate
process of the .nf
file ?
memory { 32.GB * task.attempt }
or
memory { 64.GB * task.attempt }
I'll try to find tests which work and fail in the next few days
Also please retry with these data
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_metagenome_mock_dna_1.fastq.gz
or
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_metagenome_mock_dna_1.fastq.gz
then gunzip that file
Then
head - n 250000 x.fastq > mock_R1.fastq
and
head - n 200000 x.fastq > mock_200k_R1.fastq
Now you have two files to test Wochenende, haybaler and growth rates
Thanks Lisa, it's probably not memory, but can you try again after editing the following in the
growth_rate
process of the.nf
file ?memory { 32.GB * task.attempt } or memory { 64.GB * task.attempt }
I'll try to find tests which work and fail in the next few days
As expected, this did not solve the issue
Also please retry with these data
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_metagenome_mock_dna_1.fastq.gz or wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR112/037/SRR11207337/SRR11207337_1.fastq.gz -o SRR11207337_metagenome_mock_dna_1.fastq.gz then gunzip x.gz Then head -n 250000 x.fastq > mock_R1.fastq and head -n 200000 x.fastq > mock_200k_R1.fastq Now you have two files to test Wochenende, haybaler and growth rates
What parameters and which reference do you use for this?
All params direct from https://github.com/MHH-RCUG/nf_wochenende/blob/colin_dev/nextflow.config
There are problems with growth_rate output though. It outputs a folder, which may overwrite other folders if they exist. So I'm trying to just output the .csv files, which should all have unique names.
Using the csv as output approach, it looks a lot better.
I get for example the following in the output folder. Please update your branch from colin_dev
and check ? Thanks
nf_wochenende/output/growth_rate/fit_results/output$ ls -1
DRR_sm_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
ERR9809359_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd_subsamples_results.csv
edit: some files are empty, eg only headers, but all were successfully run, no errors in nextflow here
cat *
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE015929_1_Staphylococcus_epidermidis_ATCC_12228__complete_genome_BAC_pos,failed,2.06,1892,50,29,6.55,[-2]
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AP006716_1_Staphylococcus_haemolyticus_JCSC1435_DNA__complete_genome_BAC_pos,moderate,1.43,187512,50,47,1.99,[]
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AP010969_1_Streptococcus_intermedius_JTH08_DNA__complete_genome_BAC_pos,failed,1.72,1225,50,16,4.28,"[-2, -3]"
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP003800_1_Streptococcus_constellatus_subsp__pharyngis_C232__complete_genome_BAC_pos,failed,1.93,1994,50,21,10.81,"[-3, -5]"
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP003860_1_Streptococcus_anginosus_C1051__complete_genome_BAC_pos,moderate,1.50,18931,50,35,2.92,[]
ERR9809370_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP020618_1_Staphylococcus_hominis_subsp__hominis_strain_K1_chromosome__complete_genome_BAC_pos,slow,1.19,297998,50,44,1.77,[]
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE015929_1_Staphylococcus_epidermidis_ATCC_12228__complete_genome_BAC_pos,moderate,1.79,12762,50,48,1.40,[]
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP002925_1_Streptococcus_pseudopneumoniae_IS7493__complete_genome_BAC_pos,failed,1.09,1606,50,29,11.11,[-5]
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP020618_1_Staphylococcus_hominis_subsp__hominis_strain_K1_chromosome__complete_genome_BAC_pos,slow,1.22,59204,50,44,2.18,[]
ERR9809371_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_FN568063_1_Streptococcus_mitis_B6_complete_genome__strain_B6_BAC_pos,fast,2.79,3388,50,32,3.59,[]
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE004091_2_Pseudomonas_aeruginosa_PAO1__complete_genome_BAC_pos,no growth,1.10,8782,50,50,1.00,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE006468_2_Salmonella_enterica_subsp__enterica_serovar_Typhimurium_str__LT2__complete_genome_BAC_pos,no growth,1.00,5061,50,48,1.82,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE016830_1_Enterococcus_faecalis_V583_chromosome__complete_genome_BAC_pos,moderate,1.64,3239,50,36,2.76,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE017262_2_Listeria_monocytogenes_str__4b_F2365__complete_genome_BAC_pos,no growth,1.06,3727,50,50,2.32,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AJ938182_1_Staphylococcus_aureus_RF122_complete_genome_BAC_pos,no growth,1.04,3438,50,44,2.77,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP011051_1_Bacillus_intestinalis_strain_T30__complete_genome_BAC_pos,moderate,1.31,4804,50,50,1.31,[]
public_mock_qsm3_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_U00096_3_Escherichia_coli_str__K_12_substr__MG1655__complete_genome_BAC_pos,slow,1.13,4703,50,49,2.10,[]
Name,Growth_class,Growth_Rate,No_Reads,Initial_Bins,Used_Bins,Fit_Err,Error_Codes
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE004091_2_Pseudomonas_aeruginosa_PAO1__complete_genome_BAC_pos,slow,1.10,11022,50,50,0.86,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE006468_2_Salmonella_enterica_subsp__enterica_serovar_Typhimurium_str__LT2__complete_genome_BAC_pos,no growth,1.00,6297,50,48,1.98,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE016830_1_Enterococcus_faecalis_V583_chromosome__complete_genome_BAC_pos,moderate,1.48,4026,50,37,3.10,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AE017262_2_Listeria_monocytogenes_str__4b_F2365__complete_genome_BAC_pos,slow,1.18,4651,50,50,1.82,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AJ938182_1_Staphylococcus_aureus_RF122_complete_genome_BAC_pos,no growth,1.05,4264,50,45,2.81,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_AP008937_1_Lactobacillus_fermentum_IFO_3956_DNA__complete_genome_BAC_pos,failed,1.08,1148,50,20,19.04,"[-2, -3, -5]"
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_CP011051_1_Bacillus_intestinalis_strain_T30__complete_genome_BAC_pos,moderate,1.36,5981,50,50,1.17,[]
public_mock_qsm_R1.ndp.trm.s.mm.dup.mq30.calmd.filt_1_U00096_3_Escherichia_coli_str__K_12_substr__MG1655__complete_genome_BAC_pos,slow,1.14,5920,50,49,2.00,[]
All params direct from https://github.com/MHH-RCUG/nf_wochenende/blob/colin_dev/nextflow.config
There are problems with growth_rate output though. It outputs a folder, which may overwrite other folders if they exist. So I'm trying to just output the .csv files, which should all have unique names.
the wochenende process doesn't work with this data... Maybe the file is corrupt? Or there are still proplems with having both Wochenende and nf_wochenende installed
Did you change all the settings on lines 20 and 38-43 ?
The file is not corrupt, if you mean the nextflow.config ? Or which _R1.fastq
input files are you using ?
If the fastq file is corrupt, you should be able to see it using
head x_R1.fastq
tail x_R1.fastq
edit - maybe you forgot to gunzip
the file before using head
to take a small portion of it ?
head looks like this:
@SRR11207337.1 1/1
CTAATAGTTGATAACTAAATAGAAAATATTTACTCATGTTTCACCTCCTTTCAATTTGACAATTAGATCACCAAACAATTTCCATTCATTTGGCCCAGGTGGATTTTTCCAAATTACTTGCCGACATCTTATAC
+
AAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJFJ<FFAJJFFAAJF<JFAFAJJAFJFFJ-FFFJFFAFFJJJAFFFF<F7A77FA
@SRR11207337.2 2/1
TAGACTGTTCTTATTGTTAACACAAGGGAGAAGAGATGATGCGCGTACTGGTTGTAGAGGATAATGCATTATTACGCCACCACCTGAAGGTTCAGCTCCAAGATTCAGGTCACCAGGTCGATGCCAC
+
AAFFJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJAJJFJJJFJJJJJJJJJJJFJJJFJJJJJFFFJJJJFJJJJAFJF7JJFJJ-FFFJJ-FJJFJJFFJJJJAA7-<AFAFAF<
@SRR11207337.3 3/1
GGTGAGGCGTCCTCTTTGGTTGACGAAAGGGCGCTGATCGCCCGGTTGAGCTGGTTTTGCCGGGAGTAGTAGCTACTCCCGACGGCGTAACCCCCGATCAAGACGACCGCCGCC
I am now trying it with the mock fastq
Looks ok. How about the tail x.fastq
?
Did the alignment work now ?
I don't think there should be problems having both versions installed, since we overwrite the env variables (only for the bash shell which is created out of nextflow, and destroyed at the end of the nextflow wochenende process) at the start of each process, eg here
nf_wochenende.nf line 329
export WOCHENENDE_DIR=${params.WOCHENENDE_DIR}
export HAYBALER_DIR=${params.HAYBALER_DIR}
I saw an error in the head commands above, should be head -n 200000 not head - n 200000
You probably corrected that already though.
head -n 250000 x.fastq > mock_R1.fastq
and
head -n 200000 x.fastq > mock_200k_R1.fastq
Looks ok. How about the
tail x.fastq
? Did the alignment work now ? I don't think there should be problems having both versions installed, since we overwrite the env variables (only for the bash shell which is created out of nextflow, and destroyed at the end of the nextflow wochenende process) at the start of each process, eg herenf_wochenende.nf line 329
export WOCHENENDE_DIR=${params.WOCHENENDE_DIR} export HAYBALER_DIR=${params.HAYBALER_DIR}
True, I am just a bit clueless why it won't work...
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAFFFF
@SRR11207337.49999 49999/1
ACATTATAGCACAGCTGATTTTAGATTGTAATACTAATTTGTATTATTTTAGCTGACTAATTATCTTTCAAGTGAATAATTGTTCATAATGCTTGTTTTTACGTCTTTAAAAAGTAGAAATTTATTTCACACGCCTTTCAATATACATACC
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@SRR11207337.50000 50000/1
AAACTGGCGGGCATTGACGATAAGATCGCGCGCCATGGTCAATGCACGTCGTTCACATCCTGCGCGAAGCTCTGTATTTTCAATCTGTTTCAGCTCGGCAACGCCAGCAAAACCAATGATCCGGCGATTCATTTCCATGCCGGCATTCACC
+
AAAFFJFAJJJJAJFFJJFJJJJJJJJJJFFFFJJJJJFJJJJFJFJJJJJJFJJFJJJJJJJJJJJJJF<FAFJJJJJJJJJJJJFJJJJJJJJJFJJFAJJJJJJJJJJFJJJJJJJJJJJJJJFJJFAJJFJAAJJJFFFJJJ777FF
Tail also looks okay
Up until now the problems with Wochenende process were
I think that was all of them, and most have since been improved.
The wochenende stage is working with other data, eg JuFo etc, right ? Bit strange.
Maybe try redownloading the fastq though it does look ok. Maybe there's file corruption on one line in the middle, bit doubtful though.
The wochenende stage is working with other data, but I also already redownloaded the fastq, so none of the listed problems seem plausible
The raspir and growth rate stages fail with data from Ilona as well...
This is weird.
Perhaps your conda/mamba environments are now too old, if they were installed with the classic Wochenende (would be strange though). The modern nf_wochenende uses a slimmed down conda env. Then just change the reference to it in the code.
However - did you get and test the fastq data from this repo ? These are two small test fastq from a mock community with even bacterial coverage so has been working well for me for all stages.
https://github.com/colindaven/ref_testing
edit - are you using the main branch, or dev, or lisa_config
? You can try with colin_dev
and just change the paths and cluster in the nextflow.config
and config.yaml
, maybe some errors crept in during the git process ?
I have a seperate conda env for nf_wochenende, so I don't use the old wochenende conda env.
(I still need to test with the mock community and I will also try with colin_dev)
Ilona tested nf_wochenende and all stages except raspir worked.
Growth rate worked for me, using the main branch and preterm sequencing data.
running it on conlin_dev doesn't change anything for me
I think the problem is, that pandas is missing from the nf_wochenende conda env
Hi @LisaHollstein perhaps you need to update, pandas is listed here - and in main branch too, just checked.
https://github.dev/MHH-RCUG/nf_wochenende/blob/colin_dev/env.wochenende.minimal.yml
Yeah, what an easy solution... I wasted way to much time into this...
I have some data for wich no growth rates are created by
nf_wochenende
. However the originalWochenende
calculates the growth rates.I get no usefull errormessage. Nextflow just outputs:
And the logs are also almost empty.
.comand.log
and.command.out
only contain:So the analysis was not completed.
.command.sh
contains:.command.run
has a lot of rows and is therefore a little bit confusing.command.begin
,.command.err
and.command.trace
are completely empty..exitcode
is 1