issues
search
Cocoon-Data-Transformation
/
cocoon
MIT License
65
stars
8
forks
source link
Profile TODO
#2
Closed
zachary62
closed
1 month ago
zachary62
commented
2 months ago
[x] Decide Regex (TODO: verify if the regex decided is true. Use LLM to fix the regex if wrong)
[x] Identify Unique key
[x] Identify Disguised Missing Value (TODO: verify if the value indeed exists)
[x] Handling long text cells during sampling. Use ...
[x] Identify Numerical Outlier
[x] Higher order attribute of lon/lat
[ ] Classify string into category. Identify Value Distribution
[x] Visualization for each column. Similar to Kaggle's.
[ ] Build an output JSON, friendly for future LLM
[ ] Log history to help debug
[ ] Find more benchmark datasets for each of the cases. Hard: Design a testing framework.
[x] Design a better UI for final report. E.g.,
https://rawcdn.githack.com/Cocoon-Data-Transformation/cocoon/d63d5fd6336ced268f90dd0d9966fd7ea6c2a37a/documentation/future_ui.html
[ ] Implement the front end using Streamlit, instead of ipywidget
[ ] Try weaker/cheaper model (e.g., gemini pro) and write repair functions
[ ] Framework issues: multinode is ugly. No one to get back to a certain point. The skip node is not implemented correctly.