aixm / donlon-outdated-

Previous AIXM 5.1 Donlon data set. No longer maintained.
14 stars 4 forks source link

Splitting Donlon #24

Closed AGhencea closed 8 years ago

AGhencea commented 9 years ago

Good Afternoon,

In order to work easier with the data-set and to have a more specialized view over what it contains, Eddy suggested to split the Donlon and to have the following sub-files:

Does anyone think this would be a problem? Or do you think we should have another structure?

Please let me know your thoughts about this idea once you get a chance.

Thank you in advance!

Regards, Andrei Ghencea

vog commented 9 years ago

This makes a lot of sense to me. Some thoughts:

  1. I propose to add an XSLT script "Donlon.xsl" that assembles the large Donlon dataset out of the parts. (using xsl:document for including all parts)
  2. Is it possible to make the individual XML file self-contained in the sense that all references (xlink:href) point only to features within the same file?
  3. As a side note: The main reason why the Donlon XML is so large is because of the metadata. We had to copy the metadata blob into each individual feature to ensure it won't get lost after insertion into a WFS or similar systems (see #20). If we shorten that metadata, we could also reduce the Donlon XML size considerably. However, splitting the file may be worthwhile nevertheless.
AGhencea commented 9 years ago

Hello Volker,

  1. I would assume the XSLT would be really useful and not hard at all to be created so I totally agree with it.
  2. I don't think it would be possible to have only self-contained references. For example, we could have an attribute of a feature A referring to another feature B which is in a different file. So if you want to make a self-contained reference for A you will have to copy B hence the file will increase its size. Plus, when you'll assemble the entire file there will be a lot of duplicates. This is only my opinion so let's wait for some more input and see if this can be done or not.
  3. Can we shorten the metadata without affecting any of the capabilities of the current data-set?

Andrei

vog commented 9 years ago

Regarding 2, I agree that we should not duplicate data. My proposal is to split the file in a way that the parts don't have to duplicate data, i.e. to split the Donlon dataset across "natural" boundaries. Not sure if this is even possible, though.

Regarding 3, this is not so much a technical question but a policy question. I believe that this decision is up to Eddy.

AGhencea commented 9 years ago

Going back to 1, Eddy suggested to have Donlon as a complete file by using the XInclude function of XML. I think it's more efficient since we won't need to create any additional files or use other processors (Xalan, Saxon). What do you think about it?

Concerning 2, unfortunately it's almost impossible to separate them in a natural way as they are all linked together. It's either we split them into specialized categories and create external references, either we keep them altogether. After we reviewed the document, we decided to add one more file - Donlon_Naviads_DesignatedPoint.

Regarding 3, Eddy explained me that the amount of metadata cannot be shorten. I know that in the file all the metadata is the same but in real situations this won't happen - all of them are different and we should try to change the ones in Donlon also. So unfortunately, this one is not a viable solution either.

vog commented 9 years ago

Regarding 1: This is a nice idea! Although for our (m-click) purposes that wouldn't help much, I agree that for many other purposes the XInclude approach will be simpler than having to use an XSLT processor.

Regarding 2: Okay, I see. That was just an idea.

Regarding 3: I already suspected that this won't be feasible. I just wanted to make sure that you are aware of this possibility.

AGhencea commented 9 years ago

Thank you for your input Volker.

Does anyone else have any concerns or comments regarding this topic? If yes, please let me know by next Monday before COB as I would like to start working on it.

Any comments or suggestions are welcome at anytime during the process so please feel free to bring them to the table.

Thank you!

Regards, Andrei

AGhencea commented 9 years ago

Good morning,

While I was trying to use the XInclude function for creation of the complete Donlon file I discovered that the parser used by most of the software don't support it. In this case, we have to continue with Volker's idea and create an XSLT script to assemble the file.

Please let me know if you have any concerns regarding this approach and we will discuss it here.

Thank you, Andrei

AGhencea commented 9 years ago

Good morning everyone,

I finished the script for assembling the Donlon data set and loaded it to Google Drive. You can find below the link to the folder which includes all the files - Donlon as it was, all the sub-files created from it, the xslt script along with the bat file to run it.

https://drive.google.com/folderview?id=0B4YVUcdz9K4ULXRVRW9zMnluemM&usp=sharing

Please have a look, try to assemble different versions of the data set and let me know if you have any comments or suggestions.

Thank you and regards, Andrei

vog commented 8 years ago

@porosnie @AGhencea What is the current state of this work? Do you need any assistance from me to incorporate the split Donlon dataset into the Git repository?

porosnie commented 8 years ago

As discussed in the CCB Webex, I am thinking of a new approach. The problem with splitting Donlon is that it takes time and it can enter in conflict with further updates online. Therefore, I would propose the following solution:

vog commented 8 years ago

Created a new issue #27 to cover the new strategy. Closing this one.