cvn001 / transflow

A snakemake workflow for WGS-based tuberculosis transmission analysis
GNU General Public License v3.0
14 stars 6 forks source link

[*] Error: run seqtrack R script failed. #8

Open Czirion opened 1 year ago

Czirion commented 1 year ago

Dear developers,

I am having a weird problem while running Transflow. In the past, I made a successful run but after changing the snp_threshold I am having an error in the transmission analysis module. My dataset has 1,652 samples, it works fine with smaller datasets.

Here is a piece of the error message :

=> Using SeqTrack to infer transmission events for all clusters with at least 4 samples.
==> Cluster 1 ... Using longitude and latitude information data.
Done
==> Cluster 2 ... Using longitude and latitude information data.
Done
==> Cluster 3 ... Using longitude and latitude information data.
[*] Error: run seqtrack R script failed.
Full Traceback (most recent call last):
  File "/hpc/home/user/miniconda3/envs/transflow/lib/python3.10/site-packages/snakemake/executors/__init__.py", line 2576, in run_wrapper
    run(
  File "/work/user/transflow/L2/workflow/rules/transmission_detection.smk", line 76, in __rule_transmission_network
  File "/hpc/home/user/miniconda3/envs/transflow/lib/python3.10/site-packages/snakemake/shell.py", line 294, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  python3 /work/user/transflow/L2/workflow/scripts/run_transmission_detection.py --cluster 5.Transmission_cluster/SNP_based_method/samples_cluster_SNP_12.csv --distance 4.SNP_distance/samples_pairwise_distance_matrix.txt --network True --output 5.Transmission_cluster/SNP_based_method --date /work/user/transflow/L2/metadata_date_L2_genomes.tsv --coord True --method trans 2> 5.Transmission_cluster/SNP_based_method/transmission_detection.log' returned non-zero exit status 1.

The complete log

The configfile

The command I am running: snakemake --snakefile workflow/transmission_analysis.snakefile --configfile configfile.yaml --verbose -c 16

The resources: A SLURM cluster, using #SBATCH --mem-per-cpu=32G and #SBATCH -c 16

Thank you,

Claudia

cvn001 commented 1 year ago

Hi Claudia,

The log file you uploaded shows that the transflow pipeline encountered an error when running the R package SeqTrack.

Please upload the contents of the "seqtrack.log" file in "5.Transmission_cluster/SNP_based_method/cluster_3", so that we can further investigate the cause of the error.

Best,

Xiangchen Li

Czirion commented 1 year ago

Thank you Xiangchen Li,

This is the seqtrack.log:

During startup - Warning messages:
1: Setting LC_COLLATE failed, using "C" 
2: Setting LC_TIME failed, using "C" 
3: Setting LC_MESSAGES failed, using "C" 
4: Setting LC_MONETARY failed, using "C" 
5: Setting LC_PAPER failed, using "C" 
6: Setting LC_MEASUREMENT failed, using "C" 
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
Calls: seqTrack ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
non-unique values when setting 'row.names': ‘M_tb_ERS6403200’, ‘M_tb_ERS6403349’, ‘M_tb_ERS6403653’ 
Execution halted
cvn001 commented 1 year ago

Thank you Claudia,

The error message you uploaded shows that some sample names in the first column of the metadata file are duplicated. Please look at the "samples.txt" file in "5.Transmission_cluster/SNP_based_method/cluster_3", or use Excel software to open the metadata file and highlight the duplicate values to check it comprehensively.

Czirion commented 1 year ago

In the metadata file, those sample names appear only once:

M_tb_ERS6403349 2017-05-04  -33.546977  20.72753    Lineage 2   lineage2.2  lineage2.2  lineage2.2  ZAF Western Cape                            False   False   S   S   R   S   R   S   R   R   R   S   S   R   S   MXF_INH_RIF_RFB_LEV_KAN
M_tb_ERS6403653 2017-02-01  -32.2171831 26.6386401  Lineage 2   lineage2.2  lineage2.2  lineage2.2  ZAF Eastern Cape                            False   False   S   S   S   S   I   S   R   R   S   R   I   S       RIF_RFB_EMB
M_tb_ERS6403200 2013-07-17  -33.546977  20.72753    Lineage 2   lineage2.2  lineage2.2  lineage2.2  ZAF Western Cape                            False   False   S   S   S   S   I   S   S   S   S   S   S   S   S   S

In the samples.txt they are indeed duplicated.

cvn001 commented 1 year ago

Sorry Claudia, I haven't encountered this kind of problem, so it's a bit late to reply.

Since the R language error report does not have specific location information, it is impossible to determine where the error occurred. Could you please send me the input "metadata" file and "samples_pairwise_distance_matrix.txt" file for testing? Other characteristic information except the sample name can be deleted in the metadata file.

Czirion commented 1 year ago

Of course, here they are:

samples_pairwise_distance_matrix.txt.gz list of samples

Thanks