Closed ambujpunn closed 6 years ago
When working with (potentially) large text files it's a good idea to read the file line-by-line to have all the benefits of it (mentioned in the README). Though if you know your file isn't too big to handle or if you don't care, then you can always use this initializer to create a CSVImporter
object using a String. To do this you would need to read the contents of the CSV file by yourself. This way you have a String object which you can use to get the total number of lines. The code could look something like this:
let contentString = try! String(contentsOfFile: "path/to/your/file.csv")
let totalLinesCount = contentString.components(separatedBy: CharacterSet.newlines).count
let importer = CSVImporter<[String: String]>(contentString: contentString)
You can also see this example in the tests here.
The above code is a workaround though and might not perfectly work depending on the line ending of your file. As you can see here we already have the lines somewhere within CSVImporter, but it's not public, so you can't read it.
I think to add official support for the total number of lines we could add a public computed property which returns an Optional to CSVImporter
which could look like this:
public var totalDataLinesCount: Int? {
guard case let stringSource = source as? StringSource else { return nil }
return stringSource?.lines.count
}
It would only work, if you initialize CSVImporter
with a String, but it would make sure you don't get into trouble with line endings.
@ambujpunn Would you like to add this feature with test and send a PR? 😃
@Dschee Wouldn't this only work for when loading an entire csv file into a huge string? Ideally, we'd like to continue and extend the awesome behavior of CSVImporter which is to read line by line rather than store it first somewhere
Well, there's a logical problem there though, isn't it? I mean, if you wanna read a file "line by line" then you can't know how many lines the file has since you haven't read the entire file yet, no? What you could do is guess the total number of lines based on the file size. But as this is not accurate by any means, I tend not to include such a feature into CSVImporter. It's gonna result in this.
If you have any other idea of how we could do this, then please, explain and I'll consider adding it.
Just a suggestion, but perhaps a separate API could be added that would iterate through the file in chunks, so everything wouldn't need to be in memory at once, just counting the line endings (not within quoted strings).
Yeah, that could be possible. But it would still mean that the file is traversed twice, once for checking the total number of lines and once for actually processing the data. Of course, in some cases this might not be a problem, so as long as documentation is very clear on the performance drawback, I'd be happy to merge this feature into CSVImporter. Any volunteers? Cause I won't much time the coming months, maybe sometime in December ...
I'm closing this feature as not many people seemed to be interested in it and there's a workaround available by checking the file manually. Feel free to post a PR if you want this feature and are ready to implement yourself.
Is there a good way to track the present progress of the importing as the importing is happening? Right now, it is only possible to see the number of lines present but in order for a UIProgressView to be added it needs an end result so a total sum of lines. In that case, we could simply divide the current number of lines with the total number of lines. However, I understand that CSVImporter is importing the file one line at a time making it hard for us to get the total until the end of the import. Is there any workaround for this?