Hi, thank you so much for sharing this helpful script for merging expression files of different samples. However, I encountered some problems when using it.
Is it possible to use more than one common field as the identifier? For instance, in my case, I have the counts of read mapping to different junctions for each sample. The columns are chrom, start, end and counts. I'd like to merge the files of all samples together, which requires the first three columns as the identifier. Is it possible to make it with this script?
Furthermore, I always get the error below when merging files with different identifiers. For example, different samples have different junctions, and I would like to keep all the junctions and set 0 to samples without the junctions.
INFO: Writing output to merge.1.txt
Traceback (most recent call last):
File "/scratch/prj/dtr/MultiMuTHER/scripts/txrevise/multipleFieldSelection.py", line 125, in
f.write("\t".join(line) + "\n")
TypeError: sequence item 2: expected str instance, int found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/scratch/prj/dtr/MultiMuTHER/scripts/txrevise/multipleFieldSelection.py", line 130, in
print("ERROR: %s" % err)
NameError: name 'err' is not defined
By the way, I have tried to use csvtk join command and merge() in R, but they all take too much time to deal with ~1000 samples. I would really appreciate it if this script could fix it with a shorter time. Or do you recommand any other tools to deal with this problem? Thank you so much.
Hi, thank you so much for sharing this helpful script for merging expression files of different samples. However, I encountered some problems when using it.
Is it possible to use more than one common field as the identifier? For instance, in my case, I have the counts of read mapping to different junctions for each sample. The columns are
chrom
,start
,end
andcounts
. I'd like to merge the files of all samples together, which requires the first three columns as the identifier. Is it possible to make it with this script?Furthermore, I always get the error below when merging files with different identifiers. For example, different samples have different junctions, and I would like to keep all the junctions and set 0 to samples without the junctions.
I have attached some files here for testing. TWPID9206_20170313.txt TWPID9206_20110217.txt TWPID9206_20140812.txt
By the way, I have tried to use
csvtk join
command andmerge()
in R, but they all take too much time to deal with ~1000 samples. I would really appreciate it if this script could fix it with a shorter time. Or do you recommand any other tools to deal with this problem? Thank you so much.All the best, Meng