First draft of Capitains metadata prototypes, XML metadata classes, and local XML resolver based on the new guidelines

sonofmun commented 4 years ago

At this point, it works with __capitains__.xml files in the mold of those in the guidelines (https://github.com/Capitains/guidelines/blob/master/data/martial/epigrammata/__capitains__.xml and https://github.com/Capitains/guidelines/blob/master/data/martial/__capitains__.xml). It does not go any farther than parsing the children named in <members>, i.e., it does not follow the links to other metadata files. This still needs work, so this PR is here to give concrete code for discussion.

sonofmun commented 4 years ago

Are the Travis tests not required for merging?

coveralls commented 4 years ago

Coverage increased (+0.9%) to 95.008% when pulling 24848928b544e57aa67de45e42d48ea1f4346b1b on capitains_xml into 30123cd650c82e03fa3119537135b81bfafdc753 on future.

sonofmun commented 4 years ago

This is a deeper dive than I was expecting. I have created specific collection prototypes for the new guidelines for readable and collection. These appear to be working (at least with my data). I have also started working on a specific local parser that follows the new guidelines. I expect that next week I will add both of those to this PR.

PonteIneptique commented 4 years ago

That's wonderful ;)

sonofmun commented 4 years ago

So I have the basic functionality of the new metadata prototypes and a new resolver based on the new CapiTainS guidelines working. I think my code is still pretty ugly but they both represent another step away from the dependence on CTS. I have written no test up to now because I assume that at least some of the classes will change significantly before it is ready for release.

sonofmun commented 4 years ago

A list of the major changes that spring to mind that are contained here:

The types of collections have been reduced significantly. The mapping from the former types to the current types is as follows
- CapitainsCtsText --> CapitainsCtsText (The new guidelines for texts are not yet ready)
- XmlCtsEditionMetadata --> XmlCapitainsReadableMetadata
- XmlCtsTranslationMetadata --> XmlCapitainsReadableMetadata
- XmlCtsCommentaryMetadata --> XmlCapitainsReadableMetadata
- XmlCtsWorkMetadata --> XmlCapitainsCollectionMetadata
- XmlCtsTextgroupMetadata --> XmlCapitainsCollectionMetadata
- XmlCtsTextInventoryMetadata --> XmlCapitainsCollectionMetadata
- CtsTextInventoryCollection --> XmlCapitainsCollectionMetadata
- XmlCtsCitation --> XmlCapitainsCitation (again, this will depend on the new text format in the guidelines)
The new local resolver can parse collections at any depth instead of being limited by the textgroup, work, text levels of the previous guidelines.
The metadata prototypes (in MyCapytain.resources.prototypes.capitains.collection) do not rely on the MyCapytain.common.reference._capitians_cts.URN class. This class has been replaced by a simple str representation instead. It may be that we will want a new class to reproduce some of the functionality of the URN class. But that seemed way too dependent on the URN formatting for the previous guidelines to be very useful here.

If I think of any more major changes that slipped my mind, I will add them as more comments here.

sonofmun commented 4 years ago

I have done some more work here, cleaning up unnecessary classes and functions, refactoring of function and parameter names that were dependent on the previous guidelines (e.g., anything with 'work' or 'textgroup' in the name), and a bit more generalization for the local resolver functions. There are still some extra, unused classes in resources.prototypes.capitains.collection. I am leaving those there for now. I can remove them once it is clear that they will not be needed.

PonteIneptique commented 4 years ago

Do you need some input from me here ?

sonofmun commented 4 years ago

I am mostly interested here in whether you think that I am on the right track with what I am doing here. If you generally think that things are OK, then I will go ahead and write unit tests for this stuff. And then we discuss the details once the PR if fully ripe.

PonteIneptique commented 4 years ago

I honestly think you are, I am wondering about some little things (given how DTS work, should we keep the dispatcher which was a hack for CTS resources ? Or should we keep it ?). Probably we should have a test repo where we look at what it creates with the current status.

sonofmun commented 4 years ago

I would lean toward keeping the things we have and adding more stuff, as long as it doesn't make the code base completely unwieldy. I would like to preserve as much backward compatibility as possible. If we can do that, then we only have one version to maintain instead of needing to maintain the old one in order to accommodate users who don't want to update their corpora to the new guidelines. In any case, I will now move over to writing unit tests and finishing the integration of these changes into Nautilus and HookTest.

PonteIneptique commented 4 years ago

Perfect !

sonofmun commented 4 years ago

There is a problem with how the resolver deals with children of a collection that is referred to twice. Let's start with examples of the __capitains__.xml files. The original collection would look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="../../capitains.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>
<collection xmlns:ti="http://chs.harvard.edu/xmlns/cts"
            xmlns:dct="http://purl.org/dc/terms/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns="http://purl.org/ns/capitains"
            xmlns:owl="http://www.w3.org/2002/07/owl#"
            xmlns:bib="http://bibliotek-o.org/1.0/ontology/"
            xmlns:cts="http://chs.harvard.edu/xmlns/cts"
            xmlns:foaf="http://xmlns.com/foaf/0.1/">
   <identifier>urn:cts:formulae:passau</identifier>
   <dc:title xml:lang="deu">Die Traditionen des Hochstifts &lt;span class="collection-origin"&gt;Passau&lt;/span&gt;</dc:title>
   <dc:type>cts:textgroup</dc:type>
   <structured-metadata>
      <bib:AbbreviatedTitle>Traditionen Passau</bib:AbbreviatedTitle>
   </structured-metadata>
   <members>
      <collection path="./heuwieser0073/__capitains__.xml"
                  identifier="urn:cts:formulae:passau.heuwieser0073"/>
      <collection path="./heuwieser0083/__capitains__.xml"
                  identifier="urn:cts:formulae:passau.heuwieser0083"/>
   </members>
</collection>

Then, the file where this file is part of another collection would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="../../capitains.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>
<collection xmlns:ti="http://chs.harvard.edu/xmlns/cts"
            xmlns:dct="http://purl.org/dc/terms/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns="http://purl.org/ns/capitains"
            xmlns:owl="http://www.w3.org/2002/07/owl#"
            xmlns:bib="http://bibliotek-o.org/1.0/ontology/"
            xmlns:cts="http://chs.harvard.edu/xmlns/cts"
            xmlns:foaf="http://xmlns.com/foaf/0.1/">
   <identifier>a:different.identifier</identifier>
   <dc:title xml:lang="deu">This is a collected collection</dc:title>
   <dc:type>cts:textgroup</dc:type>
   <structured-metadata>
      <bib:AbbreviatedTitle>Collected</bib:AbbreviatedTitle>
   </structured-metadata>
   <members>
      <collection path="../passau/heuwieser0083/__capitains__.xml" identifier="urn:cts:formulae:passau.heuwieser0083"/>
      <collection readable="true" path="../passau/heuwieser0073/passau.heuwieser0073.lat001.xml">
         <identifier>urn:cts:formulae:passau.heuwieser0073.lat001</identifier>
         <parent>urn:cts:formulae:passau.heuwieser0073</parent>
         <dc:title xml:lang="deu">Die Traditionen des Hochstifts Passau (Ed. Heuwieser) Nr. 73</dc:title>
         <dc:description xml:lang="deu">ZEUGENAUSSAGEN (AUFZEICHNUNGEN) ÜBER DEN DER PASSAUER KIRCHE GEHÖRIGEN BESITZ DER KLÖSTERCHEN ROTT (POSTMÜNSTER) UND RIMBACH, ZU UTTENDORF UND OBERBIERBACH (BAYERBACH).</dc:description>
         <dc:language>lat</dc:language>
         <dc:type>cts:edition</dc:type>
         <dc:contributor>Prof. Dr. Philippe Depreux (Universität Hamburg)</dc:contributor>
         <dc:contributor>Franziska Quaas (Universität Hamburg)</dc:contributor>
         <dc:contributor>Margherita Mariani (Studentische Hilfskraft, Universität Hamburg)</dc:contributor>
         <dc:contributor>Matthew Munson (Universität Hamburg)</dc:contributor>
         <dc:contributor>Morgane Pica (Praktikantin, Ecole nationale des Chartes, Paris, Frankreich)</dc:contributor>
         <dc:publisher xml:lang="mul">Formulae-Litterae-Chartae Projekt</dc:publisher>
         <dc:format>application/tei+xml</dc:format>
         <dc:source>Die Traditionen des Hochstifts Passau Nr. 73, in: Max Heuwieser, Die Traditionen des Hochstifts Passau, München 1930 1969, [URI: http://d-nb.info/458436127], S. 61-62.</dc:source>
         <structured-metadata>
            <dct:abstract xml:lang="deu"/>
            <bib:editor>Heuwieser, Max</bib:editor>
            <dct:dateCopyrighted>1930 1969</dct:dateCopyrighted>
            <dct:created/>
            <dct:bibliographicCitation>Die Traditionen des Hochstifts Passau Nr. 73, in: Max Heuwieser, Die Traditionen des Hochstifts Passau, München 1930 1969, [URI: http://d-nb.info/458436127], S. 61-62.</dct:bibliographicCitation>
         </structured-metadata>
      </collection>
      <collection readable="true" path="../passau/heuwieser0073/passau.heuwieser0073.lat005.xml">
         <identifier>urn:cts:formulae:passau.heuwieser0073.lat005</identifier>
         <parent>urn:cts:formulae:passau.heuwieser0073</parent>
         <dc:title xml:lang="deu">Die Traditionen des Hochstifts Passau (Ed. Heuwieser) Nr. 73</dc:title>
         <dc:description xml:lang="deu">ZEUGENAUSSAGEN (AUFZEICHNUNGEN) ÜBER DEN DER PASSAUER KIRCHE GEHÖRIGEN BESITZ DER KLÖSTERCHEN ROTT (POSTMÜNSTER) UND RIMBACH, ZU UTTENDORF UND OBERBIERBACH (BAYERBACH).</dc:description>
         <dc:language>lat</dc:language>
         <dc:type>cts:edition</dc:type>
         <dc:contributor>Prof. Dr. Philippe Depreux (Universität Hamburg)</dc:contributor>
         <dc:contributor>Franziska Quaas (Universität Hamburg)</dc:contributor>
         <dc:contributor>Margherita Mariani (Studentische Hilfskraft, Universität Hamburg)</dc:contributor>
         <dc:contributor>Matthew Munson (Universität Hamburg)</dc:contributor>
         <dc:contributor>Morgane Pica (Praktikantin, Ecole nationale des Chartes, Paris, Frankreich)</dc:contributor>
         <dc:publisher xml:lang="mul">Formulae-Litterae-Chartae Projekt</dc:publisher>
         <dc:format>application/tei+xml</dc:format>
         <dc:source>Die Traditionen des Hochstifts Passau Nr. 73, in: Max Heuwieser, Die Traditionen des Hochstifts Passau, München 1930 1969, [URI: http://d-nb.info/458436127], S. 61-62.</dc:source>
         <structured-metadata>
            <dct:abstract xml:lang="deu"/>
            <bib:editor>Heuwieser, Max</bib:editor>
            <dct:dateCopyrighted>1930 1969</dct:dateCopyrighted>
            <dct:created/>
            <dct:bibliographicCitation>Die Traditionen des Hochstifts Passau Nr. 73, in: Max Heuwieser, Die Traditionen des Hochstifts Passau, München 1930 1969, [URI: http://d-nb.info/458436127], S. 61-62.</dct:bibliographicCitation>
         </structured-metadata>
      </collection>
   </members>
</collection>

Both of these files refer to the __capitains__.xml file in the passau/heuwieser0083/ folder. These two collections are then parsed as XmlCapitainsCollectionMetadata, which produces the following representations of their children: The first file:

[XmlCapitainsCollectionMetadata(urn:cts:formulae:passau.heuwieser0073)#140265110261888, XmlCapitainsCollectionMetadata(urn:cts:formulae:passau.heuwieser0083)#140265110264856]

The second file:

[XmlCapitainsReadableMetadata(urn:cts:formulae:passau.heuwieser0073.lat001)#140265110265584, XmlCapitainsReadableMetadata(urn:cts:formulae:passau.heuwieser0073.lat005)#140265110265136, XmlCapitainsCollectionMetadata(urn:cts:formulae:passau.heuwieser0083)#140265110263904]

So Passau 83 in the first collection is represented by the object XmlCapitainsCollectionMetadata(urn:cts:formulae:passau.heuwieser0083)#140265110264856 and in the second collection by the object XmlCapitainsCollectionMetadata(urn:cts:formulae:passau.heuwieser0083)#140265110263904. Both of these object, however, have the same id, i.e., 'urn:cts:formulae:passau.heuwieser0083'. Then, when the resolver dispatches the two collection with this code: https://github.com/Capitains/MyCapytain/blob/c93b86cb1ddbab106320a6f0b211e1a67c917022/MyCapytain/resolvers/capitains/local.py#L206-L219 it correctly adds the children to the original collection, since this collection was dispatched first, but when it comes to the second collection, the object from the first collection is what is called with the line https://github.com/Capitains/MyCapytain/blob/c93b86cb1ddbab106320a6f0b211e1a67c917022/MyCapytain/resolvers/capitains/local.py#L213 and that is then updated with the children from the second collection. The object from the second collection, however, is never updated with its children. So that means that the second collection is never followed all the way to its readable descendants.

sonofmun commented 4 years ago

Regarding our conversation earlier today @PonteIneptique , I don't think that having dictionaries and then parsing later will solve this problem. Once you parse an XmlCapitainsCollectionMetadata, it is the object that is set as the member and not the identifier. So the descendants need to be added to that object, not just to the identifier. Perhaps the way to go would be to replace the actual objects from the second collection with the objects from the first, but I am not sure what the best way to do this is. Or even if it is possible with the system as it now is.

PonteIneptique commented 4 years ago

One way to deal with that is that the Collection object parse method should have a dictionary which would provide already parsed elements, that way, it would prevent parsing a second time.

sonofmun commented 4 years ago

Hmm. How would I do that without making XmlCapitainsCollectionMetadata, which is where the parsing of the XML happens, dependent on the resolver? XmlCapitainsCollectionMetadata doesn't know whether anything else has been parsed. And I don't know if it should.

PonteIneptique commented 4 years ago

Make it an optional parameter that is used only if it is present (or in this case, created if it is not)

def parse(xml, parsed=None):
    if parsed is None:
        parsed = {}
    # Then use it and transmit it to further parsing, this way, if you end up with a cycle (children connected to grand parents), it does not reparse !

PonteIneptique commented 4 years ago

Note that the same way the resolver could be optional. And I think it would be fine, because it might actually simplify a lot of things to make those dependants. Except for hooktest, things are always looked at through the lens of a resolver. And the resolver might be the best way to handle relationships between objects !

sonofmun commented 4 years ago

I think I like this second option (of handling relationships through the resolver) better. That would mean that XmlCapitainsCollectionMetadata would only carry metadata at the collection level and wouldn't know anything about its children, or at least not until the resolver injected that information into the collection. I think it is a good idea to keep the information about a collections descendants on the collection object itself. But I think it is a good idea to add these things to the collection through the resolver and not during the parsing of the collection itself. I will start to work in this direction and see what I can come up with.

PonteIneptique commented 4 years ago

Note that I do not think they are both contradictory, I think they might even work better together :)

sonofmun commented 4 years ago

I have come up with a solution. Let me know what you think @PonteIneptique , especially from a performance standpoint. I will write unit tests over the next couple of days to make sure that it does what it is supposed to do.

sonofmun commented 4 years ago

I have reactivated the ability within XmlCapitainsCollectionMetadata.parse that the collection be connected to its children. This still works with the resolver since the resolver will update the dictionary of children and overwrite those children with the ones that should be used within the resolver environment.

sonofmun commented 4 years ago

Question @PonteIneptique : Should a readable text be listed twice under readableDescendants if it occurs in two different collections, e.g., in its original collection and in a collected collection that pulls members from this original collection? This deals with the case already illustrated above.

PonteIneptique commented 4 years ago

Question @PonteIneptique : Should a readable text be listed twice under readableDescendants if it occurs in two different collections, e.g., in its original collection and in a collected collection that pulls members from this original collection? This deals with the case already illustrated above.

Nope, readableDescendants are thought to be unique, because of the way CTS worked. This is not documented though. It should be an ordered set in some way :)

sonofmun commented 4 years ago

From a Python standpoint, I don't think you can have a set of CapitainsReadableMetadata objects since CapitainsReadableMetadata objects are not hashable. It shouldn't be difficult to work around this, however.

sonofmun commented 4 years ago

And what about plain descendants? Can those be repeated in that case above?

PonteIneptique commented 4 years ago

Same situations. Regarding non-duplicability, that's why dictionaries are useful x[id] = Coll -> x.values() will be a set of collection ;)

sonofmun commented 4 years ago

Quick question @PonteIneptique : In MyCapytain 2, when retrieving a textual node, there was the idea of the canonical text, meaning that if you sent a work-level URN to __getText__, it would return the text from one of the editions of that work. The question is whether we want this functionality for version 3.0. Right now, if you request the text from an object that is not readable, __getText__ throws and error saying the object is not readable. Should it, instead, grab a random readableDescendant of the object? Since we no longer have the concept of edition vs. translation encoded in the object itself, we can't request an edition. My feeling is we should keep it throwing an error and let the implementers deal with that as they see fit, e.g., catch the error and return a text of their choosing, instead of hard-coding behavior in there.

sonofmun commented 4 years ago

OK, so the get_label problem was not solved, at least not in Python 3.5. Check out the code below to see the problem: https://github.com/Capitains/MyCapytain/blob/585510d0bc1cc0035508b9d54059a144a7386ca9/tests/resolvers/guidelines_v3/test_local.py#L593-L659

sonofmun commented 4 years ago

The language label is actually present on the Literal object value in the graph, as is shown by this output:

for predicate, obj in node.metadata.graph[node.metadata.asNode()]:
    try:
        print(obj, 'LANGUAGE', obj.language)
    except:
        continue

OUTPUT

Salzburger Urkundenbuch (Ed. Hauthaler); Codex A Nummer 100 LANGUAGE deu
Hauthaler, Willibald LANGUAGE None
a) Der Edle Vodalhard (Odalhard) übergibt dem Erzbischof 7 Huben am Ergoltsbach (Zufluss der kleinen Laber n. Landshut) mit Vorbehalt von 3 Joch in jeder Zelch und einer Hofstatt auf der Westseite als Hantgemâl (Stammgut), wofür er zu ewigem Eigenthum für sich und seine Nachkommen erhält, was er bisher zu Weidenbach (w. Ampfing) als Eigen innegehabt hat, nur ausgenommen die Kirche, den Kirchhof und eine Baustätte. b) Altorf übergibt dem Erzbischof am gleichen Tage 5 Hubcn zu Ergoltsbach und erhält dafür Volagangesperch (seit 13. Jahrh. Neumarkt a/Rott), wieder ausgenommen die Kirche, den Kirchhof und eine Baustätte. LANGUAGE deu
Salzburger Urkundenbuch (Ed. Hauthaler); Codex A Nummer 100 LANGUAGE deu
lat LANGUAGE None
 LANGUAGE deu
Salzburger Urkundenbuch (Ed. Hauthaler); Codex A Nummer 100 LANGUAGE deu
 LANGUAGE None
1910 LANGUAGE None
Salzburger Urkundenbuch; Codex A Nummer 100, in: Willibald Hauthaler, Salzburger Urkundenbuch Bd. 2, Salzburg 1910, S. 162-163. LANGUAGE None

sonofmun commented 4 years ago

OK. So I think (again) this was user error in that I had labelled the title in the __capitains__.xml file as 'deu' when it should have been 'eng'. Hopefully the tests will pass now.

sonofmun commented 4 years ago

I am not sure what is going on now. The test works in 3.6 but not in 3.5. And they both install the same version of LXML.

sonofmun commented 4 years ago

The problem was that, as of Python 3.6, a dictionary iterator was returned in the insertion order. This was not the case in 3.5. What this meant was that in 3.6, since the top-level collections were always inserted in the id_to_coll dictionary first, they would be dispatched first. And once they were dispatched as children of the default collection, their children would be added to them instead of being dispatched at the top level. In 3.5, however, since the dict did not iterate in insertion order, any random descendant collection could be dispatched before the top-level collection came up for dispatching. The solution I have implemented here is better because it will always dispatch the collections that have no parents first, thus meaning that any collection that has a parent will not be dispatched but, instead, added to its parent.

sonofmun commented 4 years ago

This last change was made to deal with the fact that between Python 3.5 and 3.6 dictionaries changed to be sorted by insertion order. This is the second time I have had to do this. If we are going to continue to support Python 3.5, we may want to start using OrderedDict so that we can always depend on dictionary order.

PonteIneptique commented 4 years ago

I think we could drop < 3.6

sonofmun commented 4 years ago

That would make my life much easier! These problems have made me worry a bit that some functions may rely on dictionary order too much. Perhaps it is something that is not such a problem with dictionaries ordered according to insertion order. Just something to keep an eye on as you work through the PR.

sonofmun commented 4 years ago

I.e., keep in mind where a function depends on dictionary order to do the right thing and perhaps in these cases add some sort of sorting to the dictionary iterator to make sure that the iterator goes off in the correct order.

PonteIneptique commented 4 years ago

Good morning ! Btw, regarding type checking, I found this resource (MyPy) https://realpython.com/python-type-checking/#the-mypy-project that we might want to use :)

sonofmun commented 4 years ago

Some updated information on how much slower the parsing of a collection is with this new code as opposed to the CTS-based code. When parsing a large collection (like our collection of almost 5000 texts), the vast majority of the time in the process is spent dispatching the collection (around 97-98% for both CTS and V3). When we look at the results of the line profiler for the V3 code for _dispatch, we see the following results:

Total time: 530.619 s
File: /home/matt/MyCapytain/MyCapytain/resolvers/capitains/local.py
Function: _dispatch at line 208

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   208                                               def _dispatch(self, collection, directory):
   209                                                   """ Run the dispatcher over a textgroup.
   210                                           
   211                                                   :param collection: Collection object that needs to be dispatched
   212                                                   :param directory: Directory in which the Collection was found
   213                                                   """
   214      4766  265166483.0  55637.1     50.0          if collection.id in self.dispatcher.collection:
   215      4743  265173596.0  55908.4     50.0              coll = self.dispatcher.collection[collection.id]
   216      4743     276352.0     58.3      0.1              coll.update(collection)
   217                                                   else:
   218        23       2254.0     98.0      0.0              self.dispatcher.dispatch(collection, path=directory)

It spent almost 9 minutes in the _dispatch function and it looks like the really problematic part is the access to self.dispatcher. Just calling a key from self.dispatcher and assigning it to a variable took up 50% of the time, which means that this is also why the other 50% of the time was spent And the problem here is that every time you want to access an item in the collection, you call the descendants function, which goes through every descendant every single time. Take a look at the following profiler results:

Total time: 275.269 s
File: /home/matt/MyCapytain/MyCapytain/resources/prototypes/metadata.py
Function: __getitem__ at line 211

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   211                                               def __getitem__(self, key):
   212                                                   """ Retrieve an item by its ID in the tree of a collection
   213                                           
   214                                                   :param key: Key of the object to delete
   215                                                   :return: Collection identified by the item
   216                                                   """
   217     14641      28566.0      2.0      0.0          if key == self.id:
   218                                                       return self
   219     19936      37983.0      1.9      0.0          for obj in self.members:
   220     15193      16357.0      1.1      0.0              if key == obj.id:
   221      9898       3524.0      0.4      0.0                  return obj
   222  21456397  250995195.0     11.7     91.2          for obj in self.descendants:
   223  21456397   23904589.0      1.1      8.7              if obj.id == key:
   224      4743     282552.0     59.6      0.1                  return obj
   225                                                   raise UnknownCollection("%s is not part of this object" % key)

In order to return 4743 values, self.descendants gets called over 21 million times. That averages over 4500 calls per value returned. And even though it doesn't take that long to run (11.7 microseconds), that number of calls simply weighs the system down. I think we need to figure out a more efficient way to walk through the descendants of an object than what we have now, which is:

member_dict = {member.id: member for member in self.members}
member_dict.update({submember.id: submember for member in self.members for submember in member.descendants})
return list(member_dict.values())

This function needs to be recursive as written since it is not clear how deep it should go. Perhaps the best way would be to make a self._descendants attribute on each object? And perhaps this could be calculated during the calls to self._add_member? I am interested to hear your thoughts @PonteIneptique

sonofmun commented 4 years ago

Actually thinking more about this, the easiest answer may be just to make self.descendants return a dictionary. Then we wouldn't need to loop through a list and check every single value. I will see what I can get to work.

PonteIneptique commented 4 years ago

Maybe the dispatcher should really reuse the collection dict of its resolver ? This would avoid changing descendants as well ?

PonteIneptique commented 4 years ago

I'll be free tomorrow morning for a chat if you want

sonofmun commented 4 years ago

I think we should change descendants (as well as members, readableDescendants and any other iterator over the members of a collection) to return a dictionary anyway. Because we use the results of these iterators for two things. Either we want to iterate over all of the objects, in which case we could use dict.items() in place of the list, or we want to find a specific item in the iterator, usually on that item's id, in which case we get code like we have above in the __getitem__ function where we have to iterate through the whole list of descendants and test the id of every one just to find a single object. If we could rewrite that portion of __getitem__ to return self.descendants[obj.id], it would save a lot of loops and a lot of time. I think I should have time to talk tomorrow morning. But we have our meeting right now. I will let you know afterwards what time would work.

sonofmun commented 4 years ago

Changing descendants so that it returns a dictionary did not work. It actually slowed things down significantly.

sonofmun commented 4 years ago

I have made almost all of the changes here. The only one that I would have liked to have made that I didn't was refactoring parent to parents. I think that the @parent.setter may have caused some problems here, though I am not 100% sure. In any case, maybe this is something that we can put off until we move the relationships to the resolver.

PonteIneptique commented 4 years ago

I'll let you squash and merge, so that this stays your commit in merging. I do not feel like the full history is important here.

PonteIneptique commented 4 years ago

But be sure to squash

Capitains / MyCapytain

First draft of Capitains metadata prototypes, XML metadata classes, and local XML resolver based on the new guidelines #199