frictionlessdata / tableschema-go

A Go library for working with Table Schema.
MIT License
46 stars 10 forks source link

valdate example and capitals.csv test data #73

Closed markgardner10 closed 5 years ago

markgardner10 commented 5 years ago

Running tableschema-go/examples/validate with bad capitals.csv test data does not produce errors as i assumed they would

some examples of fat fingering test data

id,capital,url 1,39.00,http://www.test.com 2,23.00,http://www.test.de 3,,http://www.test.uk <----- missing value for capital 4,28.00,http://www.test.co.il

produces

2019/04/06 08:12:17 Cast Row: {ID:1 Capital:39 URL:http://www.test.com} 2019/04/06 08:12:17 Cast Row: {ID:2 Capital:23 URL:http://www.test.de} 2019/04/06 08:12:17 Couldn't unmarshal row:[3 http://www.test.uk] err:"strconv.ParseFloat: parsing \"\":

this works as i would of expected it too

but missing or random values dont produce any errors, table.Iter() at line 53, only seems to return the first two valid rows which are processed and the program then exits without error, no error is returned at line 53 either

id,capital,url 1,39.00,http://www.test.com 2,23.00,http://www.test.de 3,,,,http://www.test.uk <------ simulate typo 4,28.00,http://www.test.co.il

produces

2019/04/06 08:05:47 Cast Row: {ID:1 Capital:39 URL:http://www.test.com} 2019/04/06 08:05:47 Cast Row: {ID:2 Capital:23 URL:http://www.test.de}

id,capital,url 1,39.00,http://www.test.com 2,23.00,http://www.test.de 3,http://www.test.uk <----- missing capital field 4,28.00,http://www.test.co.il

produces

2019/04/06 08:13:48 Cast Row: {ID:1 Capital:39 URL:http://www.test.com} 2019/04/06 08:13:48 Cast Row: {ID:2 Capital:23 URL:http://www.test.de}

Am I misunderstanding how to deal with missing data in the csv dataset, is there a better example of validating csv input with schema to catch errors in the csv data

Thanks

danielfireman commented 5 years ago

Hi @Arg0naut .. your understanding is correct. The validation process is missing one check: the number of fields.

This will match the js implementation. Gonna fix it soon. Thanks for reporting!

danielfireman commented 5 years ago

Actually, the example is correct, but confusing. The problem is that the underlying Golang's CSV reader already checks the number of fields and returns an error. When the happens, it.Next() returns false and exits the for loop.

During this research, I found that schema.CastRow wasn't explicitly validating the number of columns (sending a push soon).

I believe most users actually don't want this exiting it.Next() and iterators are used combined with schema.CastRow. So, I am going to fix this too.

danielfireman commented 5 years ago

Published release 1.2 with those changes.

Thanks!