This PR closes the issue #18 and refactors the repository for inventory cleaning. This commit includes the following steps:
to_xml code updation to more correct xml format coversion.
Updated correct dataset profile addition and compressing the datasets to reduce repository size.
Removing previous twitter dataset related references for avoiding confusing.
adding an interactive Google colab notebook for future integrated schema prototyping for the schema matching process.
The changes looks appropriate from my end. Please, review them and add a comment based on your inputs, I can further make the changes.
Second, also sharing the link for the integrated schema prototyping google colab & drive repository. As it looks to me that we might be needing to create the xml files again may be because of the id thing probably as its inconsistent in my opinion with the schema mapping logic for the three datasets.
This PR closes the issue #18 and refactors the repository for inventory cleaning. This commit includes the following steps:
to_xml
code updation to more correct xml format coversion.twitter
dataset related references for avoiding confusing.Google colab
notebook for futureintegrated schema
prototyping for the schema matching process.The changes looks appropriate from my end. Please, review them and add a comment based on your inputs, I can further make the changes.
Second, also sharing the link for the integrated schema prototyping google colab & drive repository. As it looks to me that we might be needing to create the xml files again may be because of the
id
thing probably as its inconsistent in my opinion with the schema mapping logic for the three datasets.Refer Link