Problem reading files required by cld

mattgalbraith commented 5 years ago

cld is not able to access the gene_list.txt file (and will presumably also fail to read the params.txt).

Command used to run cld: docker run -v $PWD boutroslab/cld_docker cld --task=end_to_end --output-dir=. --parameter-file=./params.txt --gene-list=./gene_list.txt

Files are in the current working directory and readable but cld throws the following error:

The gene list file ./gene_list.txt could not be opened. Either the user has no rights the read it or the file does not exist. at /usr/bin/cld line 1741.

fheigwer commented 5 years ago

Hello,

this seems to be a shortcoming on our side in the documentation.

first with '-v /sourcedir:/targetdir' there must be source and target. I tend to put the target as /data .

Then the gene_list and all other needed parameter files have to be that same target directory path. such that --gene_list=./gene_list.txt should rather be --gene_list=/data/gene_list.txt

fheigwer commented 5 years ago

one also needs to download a data base. Preferably from our page :

https://www.dkfz.de/signaling/crispr-downloads/DATABASES/

I downloaded the drosphila zip unpacked and put it into the downloaded github repo.

changed the paths for databases in the params file and the genes in the gene list and ran the following command.

docker run -v ~/Downloads/cld_docker-master:/data boutroslab/cld_docker cld --task=end_to_end --output-dir=. --parameter-file=/data/paramsdmel.txt --gene-list=/data/gene_list_dmel.txt

that worked just fine. Just drop a message if your still having issues.

I'll fix the parameter file to look for the databases in /data

mattgalbraith commented 5 years ago

Thanks -it is now running with the following command (gene_list.txt, params.txt, and genome_dir all in source_dir ($PWD)): docker run -v $PWD:/data boutroslab/cld_docker cld --task=end_to_end --output-dir=. --parameter-file=/data/params.txt --gene-list=/data/gene_list.txt

Output message when running with gene list of first 10 genes:

Possible precedence issue with control flow operator at /usr/local/share/perl/5.26.1/Bio/DB/IndexedBase.pm line 845. ./Tue_May_21_15_36_49_2019 ENSCAFG00000010935 has been searched for designs. Search is done 10 %. ENSCAFG00000031744 has been searched for designs. Search is done 20 %. ENSCAFG00000029674 has been searched for designs. Search is done 30 %. ENSCAFG00000010945 has been searched for designs. Search is done 40 %. ENSCAFG00000028761 has been searched for designs. Search is done 50 %. ENSCAFG00000010931 has been searched for designs. Search is done 60 %. ENSCAFG00000010929 has been searched for designs. Search is done 70 %. ENSCAFG00000010966 has been searched for designs. Search is done 80 %. ENSCAFG00000011034 has been searched for designs. Search is done 90 %. ENSCAFG00000011041 has been searched for designs. Search is done 100 %. # reads processed: 1576 # reads with at least one reported alignment: 1576 (100.00%) # reads that failed to align: 0 (0.00%) Reported 47270 alignments ENSCAFG00000010935 is completed 100% ENSCAFG00000031744 is completed 100% ENSCAFG00000029674 is completed 100% ENSCAFG00000010945 is completed 100% ENSCAFG00000028761 is completed 100% ENSCAFG00000010931 is completed 100% ENSCAFG00000010929 is completed 100% ENSCAFG00000010966 is completed 100% ENSCAFG00000011034 is completed 100% ENSCAFG00000011041 is completed 100% Number of designs excluded because their nucleotide composition was too invariable or contained TTTTT = 3863 Number of designs excluded because they did not hit any exon = 34110 Number of designs excluded because they did not hit any gene = 1956 Number of designs excluded because they hit multiple targets or none = 71 Number of designs excluded because they were not directly behind the ATG of the specified transcript = 3695 Number of designs that hit a specific target = 323 Number of successful designs = 323 Number of total possible designs = 44018 ENSCAFG00000011034 is missing from the library. It was covered by 13 designs. Maybe it was covered to low or not found in the cld database. ENSCAFG00000031744 is missing from the library. It was covered by 13 designs. Maybe it was covered to low or not found in the cld database. 3 genes are missing because of to harsh design criteria or because they were not found by CLD in the data base.

Forgive me if I am missing something, but I cannot see any output files - output files should be found in source_dir ($PWD)? Both out_gff and draw_html_report are requested in params.txt

fheigwer commented 5 years ago

The error you get is just a warning. If you set the --output-dir=. As --output-dir=/data then you result files will eventually be in $PWD

Am 21.05.2019 um 17:44 schrieb mattgalbraith notifications@github.com:

--output-dir=.

mattgalbraith commented 5 years ago

Yes, output files are placed in $PWD with this command: docker run -v $PWD:/data boutroslab/cld_docker cld --task=end_to_end --output-dir=/data --parameter-file=/data/params.txt --gene-list=/data/gene_list.txt

No html report but not a big deal.

Thanks for the prompt responses.

fheigwer commented 5 years ago

Indeed, the html report is suppressed if a certain number of sgrna is exceeded.

Am 21.05.2019 um 18:40 schrieb mattgalbraith notifications@github.com<mailto:notifications@github.com>:

Yes, output files are placed in $PWD with this command: docker run -v $PWD:/data boutroslab/cld_docker cld --task=end_to_end --output-dir=/data --parameter-file=/data/params.txt --gene-list=/data/gene_list.txt

No html report but not a big deal.

Thanks for the prompt responses.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/boutroslab/cld_docker/issues/3?email_source=notifications&email_token=ABAJUGMV3QMMPKDDMQETKX3PWQQYTA5CNFSM4HOHUH22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV4PUXI#issuecomment-494467677, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABAJUGIQ2HAWZ5JL37M4JF3PWQQYTANCNFSM4HOHUH2Q.

boutroslab / cld_docker

Problem reading files required by cld #3