Open dave08 opened 1 week ago
Hi. Library we're using now only has String and Char options for delimiter. Is your file a CSV/TSV or just a plain txt with some special format you want to parse?
Say I have (output from kubectl get namespaces
):
NAME STATUS AGE LABELS
argo-events Active 2y77d app.kubernetes.io/instance=argo-events,kubernetes.io/metadata.name=argo-events
argo-workflows Active 2y77d app.kubernetes.io/instance=argo-workflows,kubernetes.io/metadata.name=argo-workflows
argocd Active 5y18d kubernetes.io/metadata.name=argocd
beta Active 4y235d kubernetes.io/metadata.name=beta
Then I have multiple spacess as delimiters...
In some command line outputs, I have two words in one column:
NAME CLUSTER CDS LDS EDS RDS ECDS ISTIOD VERSION
foo-5fcd67944f-2t97k.dev Kubernetes SYNCED SYNCED SYNCED SYNCED NOT SENT istiod-1-18-7-dbcdbb5f4-nth9n 1.18.7
foo-6f8bf4c9b9-qrwf9.prod Kubernetes SYNCED SYNCED SYNCED SYNCED NOT SENT istiod-1-16-7-6d46d45875-gxtzw 1.16.7
Like that NOT SENT... that's where a regex can help here. It's not just tabs, it's a bunch of spaces.
Also, how would you parse Markdown tables (or similar)...? Unless the library trims all those extra spaces... but I guess with markdown there might be more complications that just a delimiter.
Good questions indeed. I think such tables should be parsed by readDelimStr in the future. For now i can only suggest something like this for Markdown.
fun String.markdownCells() = trim('|').split("|").map { it.trim() }
val s = """
| Month | Savings |
| -------- | ------- |
| January | $250 |
| February | $80 |
| March | $420 |""".trimIndent()
val lines = s.lineSequence()
lines.drop(2).toList().toDataFrame().split { value }.by { it.markdownCells() }.into(lines.first().markdownCells())
I think that's a bit of an advanced technique for most people with this kind of use case... and it involves parsing in two steps...
I wonder if some kind of readDSL would be better here... it could possibly work by line and give helpers for extracting the titles and values?
Please share desired API or example of usages that you have in mind. Maybe something like this could be added
Maybe since this is a function to especially read delimeters, it might be useful to have an override that takes in a Regex as a delimiter... this might be used for command line output tables that are usually space separated but sometimes inside a column value there might be a single space, so I need to use "\s\s+" to correctly read it in.