Integrate schema file preparation commit

This PR closes the issue #18 and refactors the repository for inventory cleaning. This commit includes the following steps:

to_xml code updation to more correct xml format coversion.
Updated correct dataset profile addition and compressing the datasets to reduce repository size.
Removing previous twitter dataset related references for avoiding confusing.
adding an interactive Google colab notebook for future integrated schema prototyping for the schema matching process.

The changes looks appropriate from my end. Please, review them and add a comment based on your inputs, I can further make the changes.

Second, also sharing the link for the integrated schema prototyping google colab & drive repository. As it looks to me that we might be needing to create the xml files again may be because of the id thing probably as its inconsistent in my opinion with the schema mapping logic for the three datasets.

Refer Link

Humorloos / IE683

Integrate schema file preparation commit #23