Open sah4030 opened 2 months ago
Hi, Thanks for the feedback. However, I can not replicate the error you got. When I run the preprocess.R script, it finishes without problems.
I run it as follows:
navigate to the directory of the cloned repository.
conda activate AlphScore
Rscript workflow/scripts/preprocess.R -i results/train_testset1/gnomad_extracted.csv.gz -o gnomad_extracted_prepro.csv.gz
The error you are referring to is probably produced by this lilne: double_ids<-str_split(n_occur[n_occur$Freq > 1,]$Var1, " ", simplify=TRUE)[,1] In your code stated above the "1]" is missing at the end - maybe this is an issue?
You could try to recreate the AlphScore environment with the file https://github.com/Ax-Sch/AlphScore/blob/main/workflow/envs/AlphScore_exact_versions_for_documentation.yaml Otherwise please provide more context (which system/ environments / versions are you using?).
Best, Axel
When running this command Rscript workflow/scripts/preprocess.R -i gnomad_extracted.csv.gz -o gnomad_extracted_prepro.csv.gz
I got this error message: Error in str_split(n_occur[n_occur$Freq > 1, ]$Var1, " ", simplify = TRUE)[, : subscript out of bounds