Closed igonzalsbmc closed 1 month ago
Hi,
those are chromosome IDs rather than transcript IDs. You may want to double check how you created the transcript expression file
cheers
Eduardo
Now I generate this for example:
SRR1725976 SRR1725977 SRR1725983 SRR1725984
ENSMUST00000119715.2|ENSMUSG00000081622.2|OTTMUSG00000017861.1|OTTMUST00000043230.1|Gm5937-201|Gm5937|643|processed_pseudogene| 0.236249 0.111980 0.288448 0.114749
ENSMUST00000088217.12|ENSMUSG00000025246.14|OTTMUSG00000017892.1|OTTMUST00000043270.1|Tbl1x-201|Tbl1x|5234|protein_coding| 12.498475 18.663619 17.518295 20.179102
ENSMUST00000133893.2|ENSMUSG00000025246.14|OTTMUSG00000017892.1|OTTMUST00000043271.1|Tbl1x-202|Tbl1x|3611|retained_intron| 0.099819 0.355980 0.273538 0.541361
ENSMUST00020183165.1|ENSMUSG00002076902.1|-|-|Gm55070-201|Gm55070|173|snRNA| 1.665572 0.000000 0.000000 0.000000
ENSMUST00000145498.2|ENSMUSG00000087152.4|OTTMUSG00000017882.1|OTTMUST00000043259.1|Cldn34-ps-201|Cldn34-ps|600|processed_pseudogene| 0.000000 0.000000 0.000000 0.000000
ENSMUST00000120309.2|ENSMUSG00000081358.2|OTTMUSG00000017872.1|OTTMUST00000043249.1|Gm14746-201|Gm14746|741|processed_pseudogene| 0.000000 0.000000 0.000000 0.000000
ENSMUST00000119819.2|ENSMUSG00000082842.2|OTTMUSG00000017878.1|OTTMUST00000043255.1|Gm14748-201|Gm14748|411|unprocessed_pseudogene| 0.000000 0.000000 0.000000 0.000000
ENSMUST00000121466.2|ENSMUSG00000081164.2|OTTMUSG00000017868.1|OTTMUST00000043238.1|Gm8722-201|Gm8722|769|processed_pseudogene| 0.044666 0.000000 0.000000 0.000000
ENSMUST00000036333.14|ENSMUSG00000035725.14|OTTMUSG00000017860.1|OTTMUST00000043228.1|Prkx-201|Prkx|4146|protein_coding| 5.665320 11.007346 11.237716 12.147375
ENSMUST00000114044.2|ENSMUSG00000035725.14|OTTMUSG00000017860.1|OTTMUST00000043229.1|Prkx-202|Prkx|2639|protein_coding| 0.011095 0.057386 0.072595 0.141069
ENSMUST00000148284.2|ENSMUSG00000085973.2|OTTMUSG00000017859.1|OTTMUST00000043227.1|Gm14742-201|Gm14742|381|lncRNA| 0.000000 0.000000 0.000000 0.093371
ENSMUST00000000003.14|ENSMUSG00000000003.16|OTTMUSG00000017891.1|OTTMUST00000043268.1|Pbsn-201|Pbsn|902|protein_coding| 0.000000 0.000000 0.000000 0.000000
ENSMUST00000114041.3|ENSMUSG00000000003.16|OTTMUSG00000017891.1|OTTMUST00000043269.1|Pbsn-202|Pbsn|697|protein_coding| 0.000000 0.000000 0.000000 0.000000
ENSMUST00000114039.3|ENSMUSG00000079522.3|OTTMUSG00000017866.1|OTTMUST00000043236.1|Gm14744-201|Gm14744|800|protein_coding| 0.000000 0.000000 0.041894 0.000000
But when I try to produce the iso_tpm_formatted file I get this error (I put above the duplicated entry for ENSMUSG00000000003.16’):
root@7d43c7ce68f4:/opt/SUPPA# Rscript scripts/format_Ensembl_ids.R /mnt/experiments/Bebee-2015-Esrps/suppa/iso_tpm.txt
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
Calls: rownames<- ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
non-unique values when setting 'row.names': ‘ENSMUSG00000000003.16’, ‘ENSMUSG00000000028.16’, ‘ENSMUSG00000000031.19’, ‘ENSMUSG00000000037.18’, ‘ENSMUSG00000000049.12’, ‘ENSMUSG00000000056.8’, ‘ENSMUSG00000000058.7’, ‘ENSMUSG00000000078.8’, ‘ENSMUSG00000000085.17’, ‘ENSMUSG00000000088.8’, ‘ENSMUSG00000000094.13’, ‘ENSMUSG00000000103.13’, ‘ENSMUSG00000000126.12’, ‘ENSMUSG00000000127.16’, ‘ENSMUSG00000000131.16’, ‘ENSMUSG00000000134.18’, ‘ENSMUSG00000000142.16’, ‘ENSMUSG00000000148.18’, ‘ENSMUSG00000000149.11’, ‘ENSMUSG00000000154.17’, ‘ENSMUSG00000000157.17’, ‘ENSMUSG00000000159.17’, ‘ENSMUSG00000000167.15’, ‘ENSMUSG00000000168.11’, ‘ENSMUSG00000000184.13’, ‘ENSMUSG00000000194.14’, ‘ENSMUSG00000000197.9’, ‘ENSMUSG00000000202.10’, ‘ENSMUSG00000000204.17’, ‘ENSMUSG00000000214.12’, ‘ENSMUSG00000000215.12’, ‘ENSMUSG00000000223.14’, ‘ENSMUSG00000000244.18’, ‘E [... truncated]
Execution halted
Apparently, the error is in fact in the format_Ensembl_ids.R. Shouldn't we take the first value when you split by |?
I reckon the script is expecting a different format or number of IDs.
For me the code that worked was:
#!/usr/bin/Rscript
# With this script, running from bash, we want to format the ids for the Ensembl transcripts for running SUPPA
# Parse command line arguments
CHARACTER_command_args <- commandArgs(trailingOnly=TRUE)
# CHARACTER_command_args[1] <- "/projects_rg/SCLC_cohorts/George/Salmon/v2/iso_tpm.txt"
if (length(CHARACTER_command_args)== 1){
file <- read.table(file=CHARACTER_command_args[1])
ids <- unlist(lapply(rownames(file),function(x)strsplit(x,"\\|")[[1]][1]))
rownames(file) <- ids
write.table(file,file=paste0(substr(CHARACTER_command_args[1],1,nchar(CHARACTER_command_args[1])-4),"_formatted.txt"),
quote=FALSE, row.names=TRUE,col.names=TRUE,sep="\t")
}
I have this error when:
The iso_tpm.txt file is: