MonashBioinformaticsPlatform / polyApipe

polyApipe
GNU Lesser General Public License v2.1
12 stars 6 forks source link

different row counts implied by arguments #6

Open wyt14 opened 3 months ago

wyt14 commented 3 months ago

I encountered the following problems when running. What do you think is the cause of the problem? Thank you very much.

res = do_pipeline(
+     out_path      ="...", 
+     counts_file_dir  = ".../polyAbam_counts/",
+     peak_info_file="polyAbam_polyA_peaks.gff", 
+     organism      ="human_ens100",
+     cells_to_use  =cells_to_use,
+     cell_name_func=cell_name_func
+     )

-- 1/4 load -- Loading .../polyAbam_counts//BT1290.tab.gz Loaded 23949 x 50 matrix of counts
Loading .../polyAbam_counts//BT1292.tab.gz Loaded 0 x 0 matrix of counts
Error in DataFrame(cell = colnames(counts_matrix), barcode = cell_to_barcode[colnames(counts_matrix)], : different row counts implied by arguments

pfh commented 3 months ago

Hi wyt14.

I'm finding it quite hard to read your example code. You can put your code in triple quotes to make it readable.

Example code.

I'm guessing the problem is something to do with the Loaded 0 x 0 matrix of counts.

wyt14 commented 3 months ago

I'm very sorry. I have modified the code. The main issue is that my BT1290.tab.gz and BT1292.tab.gz were generated from the same batch of code. However, in function "do_pipeline", BT1290 loaded successfully, while BT1292 did not load successfully. I'm not sure what went wrong. I look forward to your response.

pfh commented 3 months ago

I would have a look at the those .tab.gz files. Is one of them empty?

Also whether your cells_to_use includes cells from both of them, and they are matching up with how cell_name_func is naming things.

wyt14 commented 3 months ago

BT1290.tab.gz Content : gene cell count GL000008.2_3067_f CCTTCCCGTCCGCTGA 1 GL000008.2_85625_f ACAGCCGCAAGACACG 1 GL000008.2_85625_f ATCACGAGTTAAGTAG 1 GL000008.2_85625_f GTGCTTCTCAAAGACA 1 GL000008.2_85625_f TGTATTCAGTCAAGGC 1 GL000009.2_128471_f CAGCTAAGTCGCGGTT 1

BT1292.tab.gz Content : gene cell count GL000008.2_3067_f AGGGTGAAGCATCATC 1 GL000008.2_85625_f AAAGCAAAGGAATGGA 1 GL000008.2_85625_f ACGGAGAAGTAGGTGC 1 GL000008.2_85625_f ACTTGTTTCTTGACGA 1 GL000008.2_85625_f AGATCTGTCAAAGACA 1

barcode_total.txt Content : BT1290_AAACCTGGTCCATGAT-1 BT1290_AAACGGGTCGGCCGAT-1 BT1290_AAAGATGAGAGTAATC-1 BT1290_AAAGATGAGTATCGAA-1 BT1290_AAAGATGCAGGATCGA-1 BT1290_AAAGCAAAGCAGCGTA-1

code:

polyApiper:::do_ensembl_organism(
    out_path="human_ens100", 
    species="Homo sapiens", 
    version="100")

# Which cell names to use.
tsne <- read.csv("/data/polyApipe/LUAD/barcode_total.txt")
cells_to_use <- matrix(unlist(strsplit(tsne[,1],split="-")),ncol=2,byrow=T)[1:50,1]

# How cell names are constructed from batch name and cell barcode.
# Should match naming in cells_to_use.
cell_name_func <- function(batch,barcode) paste0(batch,"_",barcode)

# - Load output from polyApipe.py
# - Produce HDF5Array SingleCellExperiment objects containing counts
# - Perform further analysis steps
res = do_pipeline(
    out_path      ="/data/polyApipe/LUAD", 
    counts_file_dir  = "/data/polyApipe/LUAD/polyAbam_counts/",
    batch_names   =c("BT1290"),
    peak_info_file="polyAbam_polyA_peaks.gff", 
    organism      ="human_ens100",
    cells_to_use  =cells_to_use,
    cell_name_func=cell_name_func
    )
pfh commented 3 months ago

Ok, so are there any BT1292 cells in barcode_total.txt? If not perhaps just give the BT1290 filename, using counts_files rather than the counts_file_dir argument.

wyt14 commented 3 months ago

Thank you very much for the reminder. I re-verified the barcode_total.txt file, and now BT1290 and BT1292 can be read normally. However, a new issue has arisen, and I look forward to your assistance. #7