bergmanlab / mcclintock

Meta-pipeline to identify transposable element insertions using next generation sequencing data
93 stars 30 forks source link

A question about running multi samples #96

Closed Morriyaty closed 2 years ago

Morriyaty commented 2 years ago

Hi,

I have lots of samples to run the pipeline by --make_annoatations. But I found that I have to run one sample each time because the --make_annoations output dir only has one. To solve this problem, I have two ways to solve this:

  1. copy the --make_annoatations output-dir to a new dir and each sample use the new dir
  2. use the command like this : python3 mcclintock.py -r Fsr.LG.fasta -c new.fa -g annotations/Fsr.LG/reference_te_locations/unaugmented_inrefTEs.gff -t annoations/Fsr.LG/te_taxonomy/unaugmented_taxonomy.tsv -1 sample1_1.fq.gz -2 sample1_2.fq.gz -p 30 -o sample1_out -m trimgalore,ngs_te_mapper2,TEMP2 (note: I run this command but it stopped at [SETUP checking locations gff: /data/01/user156/wyj/01.pop.TE/annoations/Fsr.LG/reference_te_locations/unaugmented_inrefTEs.gff] for a long time ) Could you give me some suggestions based on two methods or other way you think reasonable?

Bests, Yinjia

cbergman commented 2 years ago

Hi @wyj-lzu

To help us understand your situation, are you attempting to (i) run McClintock on a single reference genome that is used by multiple WGS samples or (ii) are you running McClintock using multiple reference genomes each with their own WGS samples?

Also, could you please clarify what you mean by "I have to run one sample each time because the --make_annoations output dir only has one" (i.e. the output dir has only one what?)

If you are doing (i), please look at https://github.com/bergmanlab/mcclintock#running-mcclintock-with-multiple-samples-using-same-reference-genome and be sure to use the --resume option if you use the same -o output directory for all samples. If you are using different -o output directories for each sample, then make sure that your paths to the gff and tsv files are specified correctly (it looks like there are some typos in your example code above).

If the code above is pseudocode, it would be helpful if you could post the real code you are using for your analysis.

Thanks, Casey

Morriyaty commented 2 years ago

Hi

I run multi samples according to (i), and it runs successfully. The error I met some days ago doesn't appear this time.

I have run --make_annoatations successfully (the output dir is called annoations )but I want to use different directories. So my command is python3 /data/01/user156/software/mcclintock/mcclintock.py -r /data/01/user152/workspace/F-TE/01.Ref/Fsr.LG.fasta -c new.fa -g annoations/Fsr.LG/reference_te_locations/unaugmented_inrefTEs.gff -t annoations/Fsr.LG/te_taxonomy/unaugmented_taxonomy.tsv -1 /data/01/user152/workspace/F-TE/02.data/E_baileyi_1_1.fq.gz -2 /data/01/user152/workspace/F-TE/02.data/E_baileyi_1_2.fq.gz -p 30 -o E_baileyi_1_out -m trimgalore,ngs_te_mapper2,TEMP2 It stopped at SETUP checking locations gff: /data/01/user156/wyj/01.pop.TE/annoations/Fsr.LG/reference_te_locations/unaugmented_inrefTEs.gff for a long time.

Bests, Yinjia

cbergman commented 2 years ago

From what I understand, you appear to have experienced a transient problem that is now solved. Is that correct? If so, can you please close this issue. Thanks, Casey.