Metadata deepcopy - Githubissues

Description and Motivation

Main Modification: convokit/model/convoKitMeta.py

With the new version of ConvoKit supporting DB mode, the behavior of corpus metadata between DB and MEM mode are not aligned due to the the fact that all operations in MongoDB involve copying data from the MongoDB database to the Python process (or vice versa), making mutation to mutable datatype metadata fields unable to get correctly updated to DB, causing data loss. Thus, we would force all metadata values to be treated as immutable in order to make metadata behavior globally consistent across different modes.

In light of this, we specifically deep copy metadata fields that are not common immutable datatypes when user is accessing metadata fields. Thus, instead of returning a pointer to the storage location (in MEM mode), we would return a copy of that metadata field, and any mutation to the copy would not be reflected in the corpus metadata storage.

For example, suppose the metadata entry "foo" is a list type, we do saved_foo = my_utt.meta["foo"], and now saved_foo would be a deep copy of my_utt.meta["foo"], and if we do saved_foo.append("new value"), no error would occur, but my_utt.meta["foo"] would not be modified, only the copy of it saved_foo is changed.

Note that this does not affect replacing the entire metadata field, if we do my_utt.meta["foo"] = 1, the system would work as intended.

Also, note that the test: convokit/tests/phrasing_motifs/test_questionSentences.py is using mutability of metadata fields when creating test corpus. We fixed it accordingly.

How has this been tested?

Passed all unit tests.

CornellNLP / ConvoKit

Metadata deepcopy #195

Description and Motivation

How has this been tested?

Other information