TRI-ML / dgp

ML Dataset Governance Policy for Autonomous Vehicle Datasets
https://tri-ml.github.io/dgp/
MIT License
94 stars 62 forks source link

feat: memory efficient dgp wicker reduction #163

Open chrisochoatri opened 2 months ago

chrisochoatri commented 2 months ago

This adds default behavior in dgp2wicker to skip spark based single instance reduction and instead reduce manually. Also exposes a skip missing files property.


This change is Reviewable