epigen / microtest

Small test data from various data types for testing pipelines
1 stars 3 forks source link

Required Input Files and NGS input files #3

Open franceskoback-Gladstone opened 4 years ago

franceskoback-Gladstone commented 4 years ago

I am implementing this microtest repository and am wondering what should be placed in the [data path] section in these code blocks:

required_input_files: [data_path] ngs_input_files: [data_path]

under the attac-seq, chip-seq, rna-seq pipelines. As I understand, the data is being inputted in correspondence with the microtest_annotation.csv sheet, so when the attac seq pipeline runs the ATTAC-seq_human_PE sample, it should read the data from attac-seq_PE.bam, as the amplicon_simple pipeline does successfully with the amplicon.fastq.gz file. I have left those two lines in the other pipelines alone with just [data_path] because I was not sure what data the pipelines would like inputted here but am getting errors including "KeyError: 'sample_root'". Any help would be appreciated! Thank you.

afrendeiro commented 4 years ago

I believe the required_input_files and ngs_input_files elements of each pipeline in the pipeline interface should be receiving [data_source] but [data_path] is also still supported.

I'm not sure therefore where the problem comes from. If you'd be willing to post both pipeline interface and project yaml configuration along with the project CSV annotation I could try to have a look.

franceskoback-Gladstone commented 4 years ago

Thank you so much, I really appreciate it. I don't believe I have changed anything from the original files, but have attached them below for reference. I followed the README in the microtest and am able to run microtest__config.tutorial.yaml, but when microtest_config.yaml attempts to run any other pipeline other than the Amplicon pipeline I run into problems. I am sure I have the data corresponding to the samples present in my config file in my data folder, ie atac-seq_PE.bam for the second sample. I have attached the "Bad file descriptor" error messages that appear for the ATAC-seq_human.PE sample below as well as the "Key error: sample_root" error message that appears under the Drop Seq pipeline samples.
Drop-seq_Errors.txt

ErrorMessages.txt

pipeline_interface.txt microtest_config.txt

microtest_annotation.xlsx

afrendeiro commented 4 years ago

Hmm the ValueError: can't have unbuffered text I/O error is somewhat familiar - I might be wrong but it might have been an issue in peppy/looper/pypiper in a specific version some time ago (update: maybe this https://github.com/databio/pypiper/issues/121).

Sorry I forgot to mention, but could you please tell me what versions you are running?

To check for the specific versions

import peppy
import looper
import pypiper
print(peppy.__version__)
print(looper.__version__)
print(pypiper.__version__)

Alternatively, you could attach the output of pip freeze?

or you could simply make sure you have all the latest versions:

pip3 install peppy==0.22.3 loopercli==0.12.4 piper==0.12.1

PS: I'm assuming the excel file in attachment was by mistake, right? It should be a CSV file.

franceskoback-Gladstone commented 4 years ago

Thank you for your response! I created a new virtual environment and tried to redo the README documentation but am unfortunately still getting the same errors. I made sure I had the correct versions of peppy, loopercli, and piper as described above; you can see the output of pip freeze below.

The errors are of two types: KeyError: 'sample.root' and KeyError: 'results_subdir'. I have attached the text of these error messages below. And yes, the microtest_annotation file is a .csv file :) I just converted it to an xlsx file because gitHub did not support a.csv attachment. The text versions of the codes above were copied and pasted from my .yaml files as well. I really appreciate your input on this

pipfreeze.txt KeyError- 'sample.root'.txt KeyError-'results_subdir'.txt

franceskoback-Gladstone commented 4 years ago

Update: I have fixed that issue by adding a results_subdir: ${HOME}/microtest line to the microtest_config.yaml file and by replacing the data_sources parameter with a more specific path name. Now the attac-seq and chipmentation pipelines will run, however, I am now running into the issue: Error: Unable to access jarfile SamToFastq which I suspect is related to an inconsistency between the version of the Picard toolkit I am running and the version used in this repository.

Do you know what version of the Picard toolkit you ran? Thank you very much