Closed hattrill closed 1 year ago
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
| COMMA (AND) | PIPE (OR) | TOTAL | PIPES that are InterPro | PIPES that are not InterPro -- | -- | -- | -- | -- | -- GAF06 | 2544 | 8 | 2552 | 2 | 6 GAF05 | 543 | 722 | 1265 | 716 | 6 GAF04 | 544 | 691 | 1235 | 685 | 6
Just adding some notes to look at this issue before Xmas break: Issue in https://flybase.atlassian.net/browse/WEB-2095
The 05 XML has this entry:
The 06 XML is slightly different in format:
This resulted in wrapping issues that Jim has fixed.
Need to have a look at the Input and Output of these lines:
I’ve just found an example with UniProt IDs in strings that is on FB2022_05, so may be something that’s been around but not picked up for a while as we don’t have many with long strings of IDs and only shows if you are on a diddy screen.
As you’ve pointed out, for the interpro ID, the reason that this is showing up now seems to be associated with the change from “comma-space” to “comma”.
In the past, it looks like in the GAF output we’ve used a “comma” for InterPro, but I think that a pipe separator would be more appropriate as comma = AND and pipe is “OR”. As I understand it, this is not the way it is stored in chado so I need to understand a bit more how this “transformation” is handled in the pipeline.
will just add these IDs for myself, as they give a range different cases to think about:
Q9W0H3 ; FBgn0035206 ; sturkopf
Q9VM50 ; FBgn0031882 ; Rab30
Q7KVP9 ; FBgn0261850 ; Xpd