Closed MostHated closed 4 years ago
No matter what you do, I don't think any package will be able to load this as a data frame right out of the box.
My suggestion is that you clean up your data first using regular expression substitution to remove comments, newlines and non-tabular data. With this, you could also write a regex to prepare all the files to use the same delimiter.
I guess you could do this in Go directly or using sed, awk and UNIX pipes.
Good luck!
Thanks for the suggestion. I was able to do just that.
Hey there, I have a folder full of files and unfortunately, the application that generates them, for some reason doesn't keep things extremely consistent. Examples will be below. From what I can tell, the first 3 lines are always comments, then the next section starting with HCONTEXT, there might be just one or there might be several. Then there are not always additional comments before you get to the sets of data, but in the second example, there are. The sets of data are always laid out the same with the first column being an application symbol, label, description, and then the last one is a list of 0 to N single key or key combinations (alt+z, or ctrl+t, etc) which are delimited by a space.
The main issue is the delimiter between the four columns are not consistent. Their layout of data is always the same, but to delimit the text, some might have a single tab (\t), some might have two, some might have three, or a single \t and a space (\s), two spaces and a tab (\s\t\s, or \s\s\t), etc.
If someone would not mind letting me know if this library is able to help me out with this, I would greatly appreciate it. If not, does anyone happen to know of one that might? I was not exactly sure what search terms to use when looking, I tried "parse text", "csv", "multiple delimiters", and various other things. Unless I need to just go and use multiple libraries and do it in different steps, I am hoping to keep it as absolutely performant as possible though at runtime.
Thanks! -MH