digitalcytometry / ecotyper

EcoTyper is a machine learning framework for large-scale identification of cell states and cellular ecosystems from gene expression data.
Other
177 stars 41 forks source link

Metadata problem #69

Closed dmiyagi closed 1 year ago

dmiyagi commented 1 year ago

Hi, I've bene using your pipeline, specifically I did Discovery in scRNA and am applying it to bulk. It works wonderfully but when I try to use a specific column "Grade" in my metadata, it fails. It doesn't happen with other columns. I've checked for the usual issues (i.e. any weird characters, spaces, symbols etc) and found nothing.

In python where I am generating this dataset: unique_histology = metadata["Grade"].unique() print(unique_histology)

outputs ['G1' 'G2']

specifically this is what happens: ` Rscript EcoTyper_recovery_bulk.R -d discovery_scRNA -m /path/bulk_recovery_microarray.txt -a /path/bulk_recovery_microarray_metadata.txt -c Grade,KnownDriver -o /path/Output /BulkOutput -t 5

Warning message: In readLines(con, warn = readLines.warn) : incomplete final line found on '/path/discovery_scRNA/config_used.yml' Running cell state recovery on: X-Like... Running cell state recovery on: Smooth_Muscle... Running cell state recovery on: Endothelial... Running cell state recovery on: Erythroid... Running cell state recovery on: Other... No legend element is put in the last 1 column under ncol = 4, maybe you should set by_row = TRUE? Reset ncol to 3. Error in x[index] : only 0's may be mixed with negative subscripts Execution halted Running cell state recovery on: Tcells... Running cell state recovery on: Tumor_1... Running cell state recovery on: Tumor_2... Running cell state recovery on: Tumor_3... Running cell state recovery on: TAM... Error in RunJobQueue() : EcoTyper failed. Please check the error message above! Execution halted`

If I use a combination of any other columns it works. I am stumped. I have also attached my package versions. Thank you so much. Renv.txt

dmiyagi commented 1 year ago

Interestingly I reran from the beginning of my pipeline (including single cell analysis) without changing anything about my code and it worked just fine the second time around. Not sure where the problem was, but just going to circle back in case anyone has the same problem.