Closed zhanyuanucb closed 5 years ago
No matter what we should not be taking the complement here. That negative is meaningless at this point. We need to update immediately. @zhanyuanucb, can you please modify this code? This is basically mapping the score to the complement, which will make a difference to the nucleotide TFBS match up. From my understanding of what the bidirectional architecture is doing, it shouldn't affect the model insanely, since the sequence is "correct" in a sense and the TFBS is also correct, but how they relate to each other will be greatly affected in half of the sequences. This could add to the "noise" we are seeing in the nucleotide information.
@adfost or @zhanyuanucb - Is there a way to see what the output sequence is, after the above loop is run? We really need to check this.
Also, the first thing we have to do is re-run everything without this line and compare with 5_map_motif_no_threshold_14Nov2018.zip
. @adfost, do you have time to reformat? How long does this take if you know what you are doing?
Should I just totally ignore negative strand?
I'm going to look into it right now.
I will try ignoring the negative strand.
No. Don't subset out the negative strand. It doesn't mean negative strand. Just ignore that part of the header. It doesn't mean anything. Just remove that part of the header. All the strands are in the same direction in the fasta file.
The following code taking the complement of a negative strand of an DNA sequence in 1a_producing_output_files_with_motif.ipynb looks suspicious:
I checked the header information in one region sequence file: