Closed bencomp closed 3 years ago
I'll let Tom, Nishad and Phil, who are writing code to do this, give their answers. I do have in mind some tests for this. I'll try to mock those up shortly.
Hi @bencomp , the short answer is that the logic has to be programmmed into the processor. The code we have written so far parses CSVs sequentially by row from the top (using python's built-in CSV module), so this is just a matter of keeping tabs of what was the latest value for shapeID. I expect there will be situations where processing would not be sequential and so this might be more problematic. I have wondered about a macro to process a TAP directly in Google Sheets or Excel, that's not a type of scripting that I have much experience with but maybe then it would be harder, requiring some pre-processing like the fill-down transformation in OpenRefine.
Did you have any specific processing scenario in mind?
@philbarker
situations where processing would not be sequential
In all of the use cases we have considered, processing would be sequential, so it would be great if we could say that this is the default.
I cannot think of cases where it would not be sequential, though that might just be my lack of imagination. If there were such cases, they would have to be pretty common, in my opinion, to justify changing the more "obvious" default.
@tombaker OK, fair point. It would have been better if I have been more specific and said "sequential starting at the top" because it is the assumption that processing starts at the top that is probably more questionable once you go beyond imperative programming. But I think we can say that top-down preprocessing might be a necessary step before the statements in a profile can be considered as indepedent constraints.
@philbarker
top-down preprocessing might be a necessary step
Is that any different from saying that "top-down processing might be a necessary first step"? The "pre" makes me unsure.
I think sometimes it would be different. Using something like the OpenRefine fill down transformation might be separate manual step that needed doing before any macros and other code that assumed each row was self-contained would work.
Thanks for the comments, all. In Python I tend to use Pandas to load CSVs, and Pandas has optimised ways to process rows in parallel. This of course requires that, like in relational databases, rows are independent of each other. I wasn't planning on processing any TAP using Pandas. Maybe it was just my experience teaching Data organisation in spreadsheets kicking in.
I think we've come to a comfort level with the answers here. @bencomp are you ok if we close this? Thanks.
Agreeing to close, although we should provide examples with all cells filled in.
When the specification notes:
… how would a processor understand to repeat the shape ID and label from the rows above it? This behaviour is not native to CSV. (In OpenRefine, you could use the Fill down transformation to get the intended result.)