Initialize default conversation ID for utts missing a conversation ID

Description

When the Corpus is initialized with utterances, we now check that all utterances have some conversation ID. If they are missing a conversation ID, they are either:
- assigned one of the form __default_conversation__{root_utt_id}, where root_utt_id is the ID of the root utterance in the Conversation
- assigned an existing conversation ID, if the utterance replies to some existing utterance with an existing conversation ID
- Implementation note: This check also constructs reply-to chains in order to figure out the root utterance, but the overall time taken is O(n)
Fixes a bug in add_utterances, where the first utterance in a Conversation is assumed to have an Utterance ID == Conversation ID
Fixes test errors for utterances where conversation_id = None
Adds test cases under tests/fill_missing_convo_ids to test this new functionality.

Motivation and Context

@jpwchang said:

If I remember correctly, I think that the previous decision was that we will not allow Utterances in a Corpus to literally belong to no Conversation (since the package is, after all, Convokit), and so what currently happens is that those Utterances get assigned to a "dummy" Conversation whose ID is None. However, this turns out to cause problems with dumping to JSON. This is because json.dump() will represent None keys as the string "null". But in utterances.jsonl the Utterances' conversation_ids will still be correctly represented as the JSON null type (which gets interpreted as None in python). This behavior is presumably because the JSON standard allows null values but not null keys. As a result, there will be a mismatch between utterances.jsonl and conversations.json. The former file will have Utterances with conversation IDs that do not exist in the latter file, while conversely, the latter file will have a conversation ID that is not used by any Utterance. Needless to say, if the user had previously assigned metadata to the placeholder conversation, this metadata will not properly get reloaded as a result of this mismatch.

How has this been tested?

Tested through CI.

CornellNLP / ConvoKit

Initialize default conversation ID for utts missing a conversation ID #178

Description

Motivation and Context

How has this been tested?