comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
262 stars 62 forks source link

Problems on multipleFieldSelection.py #163

Open Dongmeng-wang opened 1 year ago

Dongmeng-wang commented 1 year ago

Hi, thank you so much for sharing this helpful script for merging expression files of different samples. However, I encountered some problems when using it.

Is it possible to use more than one common field as the identifier? For instance, in my case, I have the counts of read mapping to different junctions for each sample. The columns are chrom, start, end and counts. I'd like to merge the files of all samples together, which requires the first three columns as the identifier. Is it possible to make it with this script?

Furthermore, I always get the error below when merging files with different identifiers. For example, different samples have different junctions, and I would like to keep all the junctions and set 0 to samples without the junctions.

INFO: Writing output to merge.1.txt Traceback (most recent call last): File "/scratch/prj/dtr/MultiMuTHER/scripts/txrevise/multipleFieldSelection.py", line 125, in f.write("\t".join(line) + "\n") TypeError: sequence item 2: expected str instance, int found During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/scratch/prj/dtr/MultiMuTHER/scripts/txrevise/multipleFieldSelection.py", line 130, in print("ERROR: %s" % err) NameError: name 'err' is not defined

I have attached some files here for testing. TWPID9206_20170313.txt TWPID9206_20110217.txt TWPID9206_20140812.txt

By the way, I have tried to use csvtk join command and merge() in R, but they all take too much time to deal with ~1000 samples. I would really appreciate it if this script could fix it with a shorter time. Or do you recommand any other tools to deal with this problem? Thank you so much.

All the best, Meng