If a corpus query response is sizeable, another search will be performed starting from the end index of the previous search. However, the xml() method just returns a concatenation of all BlackLab XML responses:
The issue with this is that, essentially, we're combining multiple standalone XML files into one string. Feeding this string into any XML parser will not yield a parse, since there are multiple XML declarations in the document.
Unfortunately, I don't see how the xml() method in itself can be improved. There doesn't seem to be an elegant way to combine the information from multiple responses, but I think returning broken XML isn't a viable option either.
Some other options:
Always return a list of all XML responses, regardless of how many requests were made
Make the xml() method index-based, so it returns the XML response of that index. This implies that there should be a way to find out how many requests were made in the first place.
Of course, I'm just thinking out loud here. A workaround for me currently is to use CorpusQuery._response, which also contains the different requests separately (which is what I need).
If a corpus query response is sizeable, another search will be performed starting from the end index of the previous search. However, the
xml()
method just returns a concatenation of all BlackLab XML responses:https://github.com/INL/chaining-search/blob/ff005f075c4ffdc6c93df0346f0af32375daec8f/chaininglib/search/CorpusQuery.py#L278
The issue with this is that, essentially, we're combining multiple standalone XML files into one string. Feeding this string into any XML parser will not yield a parse, since there are multiple XML declarations in the document.
Unfortunately, I don't see how the
xml()
method in itself can be improved. There doesn't seem to be an elegant way to combine the information from multiple responses, but I think returning broken XML isn't a viable option either.Some other options:
xml()
method index-based, so it returns the XML response of that index. This implies that there should be a way to find out how many requests were made in the first place.Of course, I'm just thinking out loud here. A workaround for me currently is to use
CorpusQuery._response
, which also contains the different requests separately (which is what I need).