Closed alanlorenzetti closed 6 years ago
Not sure what's happening for the first one. But the second one is due to extra "=" in the last column of gff file, which could be a ID of repeat name.
For example, the last column of gff should have structure like this, "ID=repeat_Chr3_2077949_2077951;Name=karma;". If you have a repeat name "Name=karma=1;", you will have the error: too many values to unpack.
Please check if you have extra "=" in "OUTDIR/repeat/results/ALL.all_nonref_insert.raw.gff".
Thank you very much. The first one doesn't occur in every sequencing library, so I don't think is a real problem. I didn't find any "=" character in repeat names, but there are other characters (e.g. ":" and "_"). I'm going to test the removal of these characters and report back.
I removed all special characters from TE names and now the error is gone.
However, there is still an error in execution:
Traceback (most recent call last):
File "/home/alorenzetti/bin/RelocaTE2/scripts/clean_false_positive.py", line 134, in
Is this error related to getting rid of insertions matching known insertions provided by the user? Can you help me to solve it?
Also, I would like to ask another question:
Why there is no TE name on insertion records that were found only with supporting reads? The name is reported as "repeat_name" in these cases. It also reports a range that is the input insert plus twenty percent of insert size (size + size * 0.2). Can I say that an unknown insertion occurs in the region within this range?
First of all, I would like to say that I'm trying to use the tool to find new insertion sites in prokaryotic genomes, and I'm violating some of the requirements of this software:
i) I'm using paired-end RNASeq data and not DNASeq; ii) This libraries have no fixed insert length, since in this experiment we are using a range of insert sizes (max 600 nt).
Despite that, I have the software properly set up and working, and I got results employing this approach. However, I got some errors during the execution and I would like to know if they are signs that analysis was compromised.
The first error occurs in the beginning of execution:
Traceback (most recent call last): File "/home/alorenzetti/bin/RelocaTE2/scripts/relocaTE_trim.py", line 462, in
main()
File "/home/alorenzetti/bin/RelocaTE2/scripts/relocaTE_trim.py", line 287, in main
coord = parse_align_blat(align_file, tandem_file, verbose)
File "/home/alorenzetti/bin/RelocaTE2/scripts/relocaTE_trim.py", line 46, in parse_align_blat
next(filehd)
StopIteration
The second one occurs in the end (Step 6):
Traceback (most recent call last): File "/home/alorenzetti/bin/RelocaTE2/scripts/clean_false_positive.py", line 134, in
main()
File "/home/alorenzetti/bin/RelocaTE2/scripts/clean_false_positive.py", line 130, in main
Overlap_TE_boundary(os.path.splitext(args.input)[0], args.refte, args.distance, args.bedtools)
File "/home/alorenzetti/bin/RelocaTE2/scripts/clean_false_positive.py", line 79, in Overlap_TE_boundary
idx, value = re.split(r'\=', attr)
ValueError: too many values to unpack
Can you help me with this issue?
Cheers