Roleren / ORFik

MIT License
33 stars 9 forks source link

STAR.align.folder() STAR Index Error #125

Closed matanel-y closed 8 months ago

matanel-y commented 1 year ago

Hello,

I am experiencing the same issue as the comment below, was it ever fixed? Thanks

"Yeah, think I figured it out. The bash call for checking valid directory structure was actually not Posix, so on Mac it fails, when it should not really. A bit strange.

Will fix this and push later.

Did the rest of your analysis go well ?

If you need any tips on making result plots or tables let me know.

I will close this issue in a week after next push, if there is nothing else?

Originally posted by @Roleren in https://github.com/Roleren/ORFik/issues/120#issuecomment-1073624349"

Roleren commented 1 year ago

Can you give me the output with error ?

Can you also give me:

packageVersion("ORFik")
sessionInfo()
matanel-y commented 1 year ago

Input is as follows:

STAR.align.folder(input.dir = conf["fastq Ribo-seq"], output.dir = conf["bam Ribo-seq"], index.dir = index, paired.end = FALSE, steps = "tr-co-ge", adapter.sequence = "auto", trim.front = 0, min.length = 20, multiQC = T)

I get three sets of messages for each file. The first one starts out as follows:

Total number of files are: 8 expr: syntax error Current step: Single end mode for file: Ctl_IP_1.fq.gz File 1 / 8 -o output folder: /Users/yhesk/Bio_data/processed_data/Ribo-seq/Ribo-Seq_RPL3_KD/ -f input file: /Users/yhesk/Bio_data/raw_data/Ribo-seq/Ribo-Seq_RPL3_KD//Ctl_IP_1.fq.gz -a adapter sequence: auto -q quality filtering: disable -s steps to do: tr-co-ge -r resume (r or new n): -l Error: the given STAR index dir does not exist!

And then all ends with:

Error: the given STAR index dir does not exist! done cleaning up /Users/yhesk/Bio_data/processed_data/Ribo-seq/Ribo-Seq_RPL3_KD/ Alignment done

but the only file in the output.dir is runcommand.log

FastP and STAR are properly installed and identified also. I was able to generate the indexes for genome and contaminants. File paths are:

~/Bio_data/references/Drosophila_melanogaster_dm6/STAR_index/contaminants_genomeDir

and

~/Bio_data/references/Drosophila_melanogaster_dm6/STAR_index/genomeDir

Other info:

packageVersion("ORFik") [1] ‘1.19.1’ sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 13.0.1

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] AnnotationDbi_1.60.0 ORFik_1.19.1 GenomicAlignments_1.34.0
[4] Rsamtools_2.14.0 Biostrings_2.66.0 XVector_0.38.0
[7] SummarizedExperiment_1.28.0 Biobase_2.58.0 MatrixGenerics_1.10.0
[10] matrixStats_0.63.0 GenomicRanges_1.50.1 GenomeInfoDb_1.34.4
[13] IRanges_2.32.0 S4Vectors_0.36.1 BiocGenerics_0.44.0

loaded via a namespace (and not attached): [1] colorspace_2.0-3 rjson_0.2.21 ellipsis_0.3.2 rprojroot_2.0.3
[5] fs_1.5.2 rstudioapi_0.14 remotes_2.4.2 bit64_4.0.5
[9] fansi_1.0.3 xml2_1.3.3 codetools_0.2-18 R.methodsS3_1.8.2
[13] cachem_1.0.6 geneplotter_1.76.0 pkgload_1.3.2 jsonlite_1.8.4
[17] annotate_1.76.0 dbplyr_2.2.1 png_0.1-8 R.oo_1.25.0
[21] shiny_1.7.3 compiler_4.2.0 httr_1.4.4 assertthat_0.2.1
[25] Matrix_1.5-3 fastmap_1.1.0 cli_3.4.1 later_1.3.0
[29] htmltools_0.5.4 prettyunits_1.1.1 tools_4.2.0 gtable_0.3.1
[33] glue_1.6.2 GenomeInfoDbData_1.2.9 dplyr_1.0.10 rappdirs_0.3.3
[37] Rcpp_1.0.9 vctrs_0.5.1 rtracklayer_1.58.0 stringr_1.5.0
[41] ps_1.7.2 mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.3
[45] restfulr_0.0.15 biomartr_1.0.2 devtools_2.4.5 XML_3.99-0.13
[49] zlibbioc_1.44.0 scales_1.2.1 BSgenome_1.66.1 hms_1.1.2
[53] promises_1.2.0.1 parallel_4.2.0 RColorBrewer_1.1-3 yaml_2.3.6
[57] curl_4.3.3 gridExtra_2.3 memoise_2.0.1 ggplot2_3.4.0
[61] biomaRt_2.54.0 stringi_1.7.8 RSQLite_2.2.19 BiocIO_1.8.0
[65] desc_1.4.2 GenomicFeatures_1.50.3 filelock_1.0.2 pkgbuild_1.4.0
[69] BiocParallel_1.32.4 fstcore_0.9.12 rlang_1.0.6 pkgconfig_2.0.3
[73] bitops_1.0-7 lattice_0.20-45 purrr_0.3.5 htmlwidgets_1.5.4
[77] cowplot_1.1.1 bit_4.0.5 processx_3.8.0 tidyselect_1.2.0
[81] magrittr_2.0.3 DESeq2_1.38.1 R6_2.5.1 generics_0.1.3
[85] profvis_0.3.7 DelayedArray_0.24.0 DBI_1.1.3 pillar_1.8.1
[89] withr_2.5.0 KEGGREST_1.38.0 RCurl_1.98-1.9 tibble_3.1.8
[93] crayon_1.5.2 utf8_1.2.2 BiocFileCache_2.6.0 urlchecker_1.0.1
[97] progress_1.2.2 usethis_2.1.6 locfit_1.5-9.6 grid_4.2.0
[101] data.table_1.14.6 blob_1.2.3 callr_3.7.3 digest_0.6.31
[105] xtable_1.8-4 httpuv_1.6.6 R.utils_2.12.2 fst_0.9.8
[109] munsell_0.5.0 sessioninfo_1.2.2 `

Any help is appreciated!

Roleren commented 1 year ago

Ok, I did a push, can you pull from here with:

devtools::install_github("Roleren/ORFik") And rerun, let me know if anything changes

Roleren commented 1 year ago

Small bugfix, redownload if you already started

matanel-y commented 1 year ago

I redownloaded but I am getting the same error.

Perhaps it is with my index? I was unable to used getGenomeandAnnotation() for Drosophila melanogaster using either ensembl or refseq so I downloaded them myself and then ran the code as follows:

genome = "~Bio_data/references/Drosophila_melanogaster_dm6/Drosophila_melanogaster.BDGP6.32.dna.toplevel.fasta" gtf = "~Bio_data/references/Drosophila_melanogaster_dm6/Drosophila_melanogaster.BDGP6.32.52.chr.gtf" contaminants = "~/Bio_data/references/Drosophila_melanogaster_dm6/merged_contaminants_phix_ncRNA_tRNA_rRNA.fasta" annotation = c("genome" = genome, "gtf" = gtf, "contaminants" = contaminants)

index = STAR.index(annotation, output.dir = conf["ref"], SAsparse = 2, max.ram =16, remake = T)

But still failed =(

Roleren commented 1 year ago

No, should be fine I think. Can you give me full error output from that last run ? :)

matanel-y commented 1 year ago

I still have the same error as above.

Roleren commented 1 year ago

This bug goes quite deep outside of ORFik, and will need some proper attention, it is only mac specific luckily.

Will let you know when I have time to look at it more. Is a bit hard since it does not happen on all mac systems.

Roleren commented 1 year ago

Update, the getGenomeandAnnotation works now, do:

getGenomeAndAnnotation("drosophila melanogaster", output.dir = file.path(config["ref"], "Drosophila_melanogaster_BDGP6"), assembly_type = "toplevel")

Index still not solved

Roleren commented 1 year ago

Update, Intel/Mac has still not fixed their back end problems. Will soon try to do a trick to fix this if nothing happens on their end.

Roleren commented 8 months ago

I have now improved POSIX compliance of the code, also added a suggestion to install fastp through conda.

This should now be resolved, if any more issues, please open a new issue and reference this one.