akcorut / kGWASflow

kGWASflow is a Snakemake workflow for performing k-mers-based GWAS.
https://github.com/akcorut/kGWASflow/wiki
MIT License
28 stars 8 forks source link

Clarification about cutadapt config #16

Closed Ax3man closed 1 year ago

Ax3man commented 1 year ago

Hi Kivanc,

I have a question about the configuration of the adapter trimming part of the pipeline. It's not clear to me from the documentation on the config how to provide the adapter sequences for paired-end reads.

Here's the relevant config block:

  # ----------------------------------------------------------------------
  #     cutadapt Params
  # ----------------------------------------------------------------------
  cutadapt:
    # Number of threads for cutadapt
    threads: 8

    # See cutadapt manual:
    # https://cutadapt.readthedocs.io/en/stable/guide.html
    # Set params for single end files
    se:
      adapters: ""
      extra: ""
    # Set params for paired end files
    # https://cutadapt.readthedocs.io/en/stable/guide.html
    pe:
      adapters: ""
      extra: ""

Reading the cutadapt docs makes it clear both an -a and -A argument should be provided with the two adapters. Should I be providing these flags like so?

    pe:
      adapters: "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
      extra: ""

Thanks so much for your help.

akcorut commented 1 year ago

Hi @Ax3man,

Yes, you can provide the adapters exactly like that. Here is an example I have previously used:

Screenshot 2023-08-23 at 7 06 56 PM

Hope this helps. Also, thanks for letting me know that the documentation on the config is not clear enough. I will add more information to make it more clear.

Best, Kivanc

Ax3man commented 1 year ago

Great, that's helpful! Yes, a simple example can help a lot, like

# Set params for paired end files (e.g. "-a ACGT -A ACGT")

In the mean time, I did run into another issue unfortunately. The kmer_gwas step is failing for me, with the following log file:

Traceback (most recent call last):
  File "/MankFlex/wouter/color_selection_lines/kmergwas/kgwasflow/scripts/external/kmers_gwas/src/py/align_kinship_phenotype.py", line 83, in <module>
    main()
  File "/MankFlex/wouter/color_selection_lines/kmergwas/kgwasflow/scripts/external/kmers_gwas/src/py/align_kinship_phenotype.py", line 56, in main
    phenotype_val = get_column(args.fn_phenotype, 1)[1:] # Phenotypes values
  File "/MankFlex/wouter/color_selection_lines/kmergwas/kgwasflow/scripts/external/kmers_gwas/src/py/functions.py", line 14, in get_column
    return [line.split(sep)[index] for line in file(fn,"r").read().split("\n")[:-1]]
IndexError: list index out of range
Error in file(file, "rt") : cannot open the connection
Calls: read.csv -> read.table -> file
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'results/kmers_gwas/car_perc/pheno.phenotypes': No such file or directory
Execution halted
Traceback (most recent call last):
  File "scripts/external/kmers_gwas/kmers_gwas.py", line 278, in <module>
    main()
  File "scripts/external/kmers_gwas/kmers_gwas.py", line 107, in main
    phenotypes_names = file(paths["pheno_permuted_fn"] ,"r").read().split("\n")[0].split("\t")[1:]
IOError: [Errno 2] No such file or directory: 'results/kmers_gwas/car_perc/pheno.phenotypes_and_permutations'

The error seems to be generated by R when it cannot find results/kmers_gwas/car_perc/pheno.phenotypes. Indeed, that directory doesn't have that file:

[mank04] kgwasflow $ ls -l results/kmers_gwas/car_perc/
total 20
-rw-r--r-- 1 wouter member 1653 Aug 24 13:01 log_file
-rw-r--r-- 1 wouter member 7004 Aug 24 13:01 pheno_for_accessions_order.fam
-rw-r--r-- 1 wouter member 7505 Aug 24 13:01 pheno.original_phenotypes
-rw-r--r-- 1 wouter member    0 Aug 24 13:01 phenotypes_transformation_permutation.log

Any clue why the pheno.phenotypes file wasn't generated?

Ax3man commented 1 year ago

Ok, I made a mistake and my phenotype files were comma separated, instead of tab-separated. Changing them to the correct format seems to solve this issue, as this step is now running.

Closing this now, but perhaps a more useful error could be generated somewhere?

Thanks again Kivanc!