SchulzLab / SNEEP

SNp Exploration and Analysis using EPigenomics data
MIT License
7 stars 1 forks source link

SNEEP test fails with invalid argument #10

Closed LeonHafner closed 4 months ago

LeonHafner commented 4 months ago

Hi together, first of all, thanks for providing this amazing tool! I am currently trying to get my SNEEP installation running with the tests described in the documentation, but this results in an error. I have installed sneep 1.0 via conda. The command I run is the following:

bash runTests.sh path/to/sneep/GRCh38.primary_assembly.genome.fa path/to/sneep/dbSNPs_sorted.txt.gz path/to/sneep/interactionsREM_PRO.txt.gz

I run it on the downloaded files from zenodo, additionally I'm using hg38 as the reference genome. The zenodo files, as well as the fasta are stored in path/to/sneep. The SNEEP GitHub directory is also cloned into this folder and is therefore located at path/to/sneep/SNEEP.

It always terminates with the message that some file is invalid formatted. This is the output I get, I hope you can help me:

HELLO github sneep version
PFM dir: examples/combined_Jaspar2022_Hocomoco_Kellis_human_transfac.txt
SNP file: examples/SNPs_EFO_0000612_myocardial_infarction.bed
genome file: path/to/sneep/GRCh38.primary_assembly.genome.fa
scale file: necessaryInputFiles/estimatedScalesPerMotif_1.9.txt
SNP file after VCF check: examples/SNPs_EFO_0000612_myocardial_infarction.bed
number SNPs: 1665
numMotifs: 817
number of tests: 1360305
rank bevor correction: 1358550
rank after correction: 0

real    3m25,910s
user    3m18,874s
sys 0m1,865s
HELLO github sneep version
-o outputDir: examples/SNEEP_output_expression/
-p use pvalue: 0.5
-c use pvalue_diff: 0.001
-t number threads 10
-t activeTFs: examples/RNA-seq_humanLV_hiPSC-CM.txt
-e ensemble_geneName: examples/TF_ensemblID_name_human_JASPAR2022_GRCh38p13.txt
-d threshold TF activity: 0.5
-b frequency: necessaryInputFiles/frequency.txt
-w transition matrix: necessaryInputFiles/transition_matrix.txt
PFM dir: examples/combined_Jaspar2022_Hocomoco_Kellis_human_transfac.txt
SNP file: examples/SNPs_EFO_0000612_myocardial_infarction.bed
genome file: path/to/sneep/GRCh38.primary_assembly.genome.fa
scale file: necessaryInputFiles/estimatedScalesPerMotif_1.9.txt
SNP file after VCF check: examples/SNPs_EFO_0000612_myocardial_infarction.bed
number SNPs: 1665
TF  without ensemble id: EWSR1-FLI1
numMotifs: 530
number of tests: 882450
rank bevor correction: 881304
rank after correction: 0

real    0m21,961s
user    2m27,266s
sys 0m4,293s
--2024-04-17 11:01:38--  https://www.encodeproject.org/files/ENCFF199VHV/@@download/ENCFF199VHV.bed.gz
Resolving www.encodeproject.org (www.encodeproject.org)... 34.211.244.144
Connecting to www.encodeproject.org (www.encodeproject.org)|34.211.244.144|:443... connected.
HTTP request sent, awaiting response... 307 Temporary Redirect
Location: https://encode-public.s3.amazonaws.com/2021/02/24/dfd1dd66-d109-4a3e-88fc-32e3e2ab889f/ENCFF199VHV.bed.gz?response-content-disposition=attachment%3B%20filename%3DENCFF199VHV.bed.gz&AWSAccessKeyId=ASIATGZNGCNX7XZMMK2Z&Signature=Zh5rp91zUgfj0ocJfQYMUY3SXRc%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEJn%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJIMEYCIQDcNthQUfdXuThyv%2Bcjvuf6UsEosy%2FxkChZw6yvnebpuAIhAL64eCDk6UpaIbU%2FhtuAZ8AmOqasruMyWzXV1c74abM8KrwFCNH%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEQABoMMjIwNzQ4NzE0ODYzIgy2ersvfWNRPcoUVEUqkAVmPMlvXke3lyMms8UP5Eww6ahImhX5%2BRFValHGzG%2FhOYwXWr%2BzmrCede1C7BSudTiI90v%2F4tn9lyHiG8160ehSPNwUEmDLYSc0UF%2FhzwhKfD6hgznISAwMlaboFJYWvSFDAZgrm2eSLyYZ9ZcGS84MkNfLG3oev1mniOKUTanBBOeM4JVnyhnOEQexic4t3seUZmcnjSWdVin4yh3QdW29t25fui9LXcDLd4D0cWUbqEF64cHjTyXShwO74HXR63q%2FoFoQBzzYQIq2xg4Ly65SJz3mCUr88R6GPFHB5%2BA79osrGxGITAFFxStRsa4ljFUWpQ%2BGPkfiM%2Bo5xHGKMwap2UxUhONh2rnEmxiBn1yVMvhQmebRERQCeVhWTRU%2BB%2Bl6EofCJ%2Bkiq57v%2B0m9PipX6JuRQJ9sRb0muHmihlNKDCR8z6JCVWnQ%2B%2FLqxllEe2uBM2K8%2B%2BAJIVfB%2FKc4Mg4SpXU1OQIxBjQYJCKGIboopmKvRwnFBzWLQZY7FdtP1MoEUZBEVGlF6zG3jW%2BEEB8mDMQaNarJ6VrKKnmf73rD5nbBsRL%2BXXGC6ZSjiCFe9jRP8j2Es5K1VjjLn4Rs%2FA87NlcnAGd6bLpzxZfcaF%2FEJ%2BNSenaX4LfpTSclwhyFlLR%2FDzJ6l4P4BnMqYWgLoUxsWkQUiimVDch6I%2BHOsuQjc4VsicJhMdxW87p5tYguwus7bGUfEJx4md5RWK5czPzBQyvj8i0nrLWYUOldx%2BFXQ0i%2BeSoMkwCL%2FB3sNWNnh7uU%2Ffazx6prm4hJodWbn8%2FUODgdnIKAs%2Bb1U6eJ4gKRr9yuGw6ylRlyM9JbONs3UCLiFy4DKeIRUDvKfygbnnTFM9EoFqgd%2Fj42JFgGK97ZqzDejP6wBjqwAd02UJEoCavr4rIYlrNJqRz0blxErqYqGyODATGwEcsGr37StQwOOuLEOVDZ%2BoEXYBQWw5MdQz3UqER0m2tIlujqqqBp%2FMMbbrRzYtkWF%2BEBjYQhsIGwnd9OLyB%2F8BWJkJ8ef3op6BvH1MtodyZEbQiD%2FtujgxT87LzotRfcyOdOOE3kPYribV9XMtmFVK%2B7t0tymSobYM0RCGHlbfpOZcjBax93LNF72Do47tj78%2Bb8&Expires=1713474099 [following]
--2024-04-17 11:01:39--  https://encode-public.s3.amazonaws.com/2021/02/24/dfd1dd66-d109-4a3e-88fc-32e3e2ab889f/ENCFF199VHV.bed.gz?response-content-disposition=attachment%3B%20filename%3DENCFF199VHV.bed.gz&AWSAccessKeyId=ASIATGZNGCNX7XZMMK2Z&Signature=Zh5rp91zUgfj0ocJfQYMUY3SXRc%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEJn%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLXdlc3QtMiJIMEYCIQDcNthQUfdXuThyv%2Bcjvuf6UsEosy%2FxkChZw6yvnebpuAIhAL64eCDk6UpaIbU%2FhtuAZ8AmOqasruMyWzXV1c74abM8KrwFCNH%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEQABoMMjIwNzQ4NzE0ODYzIgy2ersvfWNRPcoUVEUqkAVmPMlvXke3lyMms8UP5Eww6ahImhX5%2BRFValHGzG%2FhOYwXWr%2BzmrCede1C7BSudTiI90v%2F4tn9lyHiG8160ehSPNwUEmDLYSc0UF%2FhzwhKfD6hgznISAwMlaboFJYWvSFDAZgrm2eSLyYZ9ZcGS84MkNfLG3oev1mniOKUTanBBOeM4JVnyhnOEQexic4t3seUZmcnjSWdVin4yh3QdW29t25fui9LXcDLd4D0cWUbqEF64cHjTyXShwO74HXR63q%2FoFoQBzzYQIq2xg4Ly65SJz3mCUr88R6GPFHB5%2BA79osrGxGITAFFxStRsa4ljFUWpQ%2BGPkfiM%2Bo5xHGKMwap2UxUhONh2rnEmxiBn1yVMvhQmebRERQCeVhWTRU%2BB%2Bl6EofCJ%2Bkiq57v%2B0m9PipX6JuRQJ9sRb0muHmihlNKDCR8z6JCVWnQ%2B%2FLqxllEe2uBM2K8%2B%2BAJIVfB%2FKc4Mg4SpXU1OQIxBjQYJCKGIboopmKvRwnFBzWLQZY7FdtP1MoEUZBEVGlF6zG3jW%2BEEB8mDMQaNarJ6VrKKnmf73rD5nbBsRL%2BXXGC6ZSjiCFe9jRP8j2Es5K1VjjLn4Rs%2FA87NlcnAGd6bLpzxZfcaF%2FEJ%2BNSenaX4LfpTSclwhyFlLR%2FDzJ6l4P4BnMqYWgLoUxsWkQUiimVDch6I%2BHOsuQjc4VsicJhMdxW87p5tYguwus7bGUfEJx4md5RWK5czPzBQyvj8i0nrLWYUOldx%2BFXQ0i%2BeSoMkwCL%2FB3sNWNnh7uU%2Ffazx6prm4hJodWbn8%2FUODgdnIKAs%2Bb1U6eJ4gKRr9yuGw6ylRlyM9JbONs3UCLiFy4DKeIRUDvKfygbnnTFM9EoFqgd%2Fj42JFgGK97ZqzDejP6wBjqwAd02UJEoCavr4rIYlrNJqRz0blxErqYqGyODATGwEcsGr37StQwOOuLEOVDZ%2BoEXYBQWw5MdQz3UqER0m2tIlujqqqBp%2FMMbbrRzYtkWF%2BEBjYQhsIGwnd9OLyB%2F8BWJkJ8ef3op6BvH1MtodyZEbQiD%2FtujgxT87LzotRfcyOdOOE3kPYribV9XMtmFVK%2B7t0tymSobYM0RCGHlbfpOZcjBax93LNF72Do47tj78%2Bb8&Expires=1713474099
Resolving encode-public.s3.amazonaws.com (encode-public.s3.amazonaws.com)... 52.218.218.243, 52.92.137.225, 52.92.179.137, ...
Connecting to encode-public.s3.amazonaws.com (encode-public.s3.amazonaws.com)|52.218.218.243|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3353224 (3,2M) [binary/octet-stream]
Saving to: ‘ENCFF199VHV.bed.gz’

ENCFF199VHV.bed.gz               100%[========================================================>]   3,20M  2,40MB/s    in 1,3s    

2024-04-17 11:01:41 (2,40 MB/s) - ‘ENCFF199VHV.bed.gz’ saved [3353224/3353224]

HELLO github sneep version
-o outputDir: examples/SNEEP_output_open_chromatin/
-p use pvalue: 0.5
-c use pvalue_diff: 0.001
-t number threads 10
-b frequency: necessaryInputFiles/frequency.txt
-w transition matrix: necessaryInputFiles/transition_matrix.txt
-f footprint/region file: ENCFF199VHV.bed
PFM dir: examples/combined_Jaspar2022_Hocomoco_Kellis_human_transfac.txt
SNP file: examples/SNPs_EFO_0000612_myocardial_infarction.bed
genome file: path/to/sneep/GRCh38.primary_assembly.genome.fa
scale file: necessaryInputFiles/estimatedScalesPerMotif_1.9.txt
SNP file after VCF check: examples/SNPs_EFO_0000612_myocardial_infarction.bed
number SNPs: 153
numMotifs: 817
number of tests: 125001
rank bevor correction: 124719
rank after correction: 0

real    0m19,856s
user    1m55,100s
sys 0m5,704s
HELLO github sneep version
-o outputDir: examples/SNEEP_output_REM_PRO_HiC/
-p use pvalue: 0.5
-c use pvalue_diff: 0.001
-t number threads 10
-b frequency: necessaryInputFiles/frequency.txt
-w transition matrix: necessaryInputFiles/transition_matrix.txt
-r REMs: path/to/sneep/interactionsREM_PRO.txt.gz
-g ensemblID to GeneName mapping: ensemblID_GeneName.txt
PFM dir: examples/combined_Jaspar2022_Hocomoco_Kellis_human_transfac.txt
SNP file: examples/SNPs_EFO_0000612_myocardial_infarction.bed
genome file: path/to/sneep/GRCh38.primary_assembly.genome.fa
scale file: necessaryInputFiles/estimatedScalesPerMotif_1.9.txt
SNP file after VCF check: examples/SNPs_EFO_0000612_myocardial_infarction.bed
number SNPs: 1665
numMotifs: 817
number of tests: 1360305
rank bevor correction: 1358655
rank after correction: 0

real    0m35,901s
user    3m35,095s
sys 0m7,063s
HELLO github sneep version
-o outputDir: examples/SNEEP_output_background_sampling/
-p use pvalue: 0.5
-c use pvalue_diff: 0.001
-b frequency: necessaryInputFiles/frequency.txt
-w transition matrix: necessaryInputFiles/transition_matrix.txt
-t number threads 20
-j number of randmoly sampled backgrounds: 100
-k path to dbSNPs: path/to/sneep/dbSNPs_sorted.txt.gz
-l seed: 2
-q min TF count: 0
-r REMs: interactionsREM_PRO_HiC.txt
-g ensemblID to GeneName mapping: ensemblID_GeneName.txt
PFM dir: examples/combined_Jaspar2022_Hocomoco_Kellis_human_transfac.txt
SNP file: examples/SNPs_EFO_0000612_myocardial_infarction.bed
genome file: path/to/sneep/GRCh38.primary_assembly.genome.fa
scale file: necessaryInputFiles/estimatedScalesPerMotif_1.9.txt
SNP file after VCF check: examples/SNPs_EFO_0000612_myocardial_infarction.bed
Error: Unable to open file interactionsREM_PRO_HiC.txt. Exiting.
number SNPs: 1665
numMotifs: 817
number of tests: 1360305
rank bevor correction: 1358655
rank after correction: 0
start random sampling
number TFs: 817
number TFs considered background sampling: 457
before sampling
terminate called after throwing an instance of 'std::invalid_argument'
  what():  invalid file format???`
runTests.sh: line 25: 1620923 Aborted                 (core dumped) differentialBindingAffinity_multipleSNPs -o examples/SNEEP_output_background_sampling/ -p 0.5 -c 0.001 -b necessaryInputFiles/frequency.txt -x necessaryInputFiles/transition_matrix.txt -n 20 -j 100 -k ${dbSNP} -l 2 -q 0 -r interactionsREM_PRO_HiC.txt -g ensemblID_GeneName.txt examples/combined_Jaspar2022_Hocomoco_Kellis_human_transfac.txt examples/SNPs_EFO_0000612_myocardial_infarction.bed ${genome} necessaryInputFiles/estimatedScalesPerMotif_1.9.txt

real    0m23,425s
user    3m43,186s
sys 0m21,873s
BaumgartenNina commented 4 months ago

Hi :)

thanks for posting the issue!I will try to reproduce the error and let you know how to fix it.

Best, Nina

BaumgartenNina commented 4 months ago

Hi @LeonHafner,

can you unzip the dbSNP file and try it again?
Can you pull our GitHub repo again? There was a hard coded path in the script runTests.sh to the intersection file, which explains why it was not found in the last test. I adapted the runTests.sh accordinly

I hope it will work for you now :)

Best, Nina

LeonHafner commented 4 months ago

Thanks a lot, that solved it!

Best, Leon