apache / parquet-java

Apache Parquet Java
https://parquet.apache.org/
Apache License 2.0
2.66k stars 1.42k forks source link

Better exception when files are unaccessible #1422

Open asfimport opened 10 years ago

asfimport commented 10 years ago

In some cases the Hadoop filesystem API will throw NullPointerException when trying to access files that have moved. We'd want to catch those and give a better error message.

Caused by: java.lang.NullPointerException
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1043)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:211)
    at parquet.hadoop.ParquetInputFormat.listStatus(ParquetInputFormat.java:395)
    at parquet.hadoop.ParquetInputFormat.getFooters(ParquetInputFormat.java:443)
    at parquet.hadoop.ParquetInputFormat.getGlobalMetaData(ParquetInputFormat.java:467)

Reporter: Julien Le Dem / @julienledem

Note: This issue was originally created as PARQUET-20. Please see the migration documentation for further details.

asfimport commented 9 years ago

A.Y Dissanayake: Hi @julienledem, Can you please explain me what you expect from "better error message" ?

Regard, A.Y

asfimport commented 9 years ago

Alex Levenson / @isnotinvain: I would imagine something like "File does not exist: foo/bar/baz.lzo"

asfimport commented 9 years ago

Alex Levenson / @isnotinvain: or in the case of globStatus, "No files found under globPath: foo/bar/*"

asfimport commented 9 years ago

A.Y Dissanayake: Hi @isnotinvain, Thank You . And I just try it by putting log statement. Please check whether it is ok as a initial step.

https://github.com/Yas101/parquet-mr/commit/15ebc6579466fd7932c61b72c6797011cdda4fcb

Regards, A.Y

asfimport commented 9 years ago

Alex Levenson / @isnotinvain: Hi A.Y, thanks for the contribution.

We definitely want this exception to remain fatal, so wrapping it in a try { } catch { log } without re-throwing will actually change the behavior to make this error silently ignored instead of letting it propagate.

Since this is a null pointer exception, hopefully we can just find the line of code that references a null pointer and instead change it to check for null, then throw a useful exception when it's null. If this NPE is coming from somewhere outside our control, then lets wrap the most narrow piece of code that we can in a

try { } catch(NullPointerException e) { throw new ParquetRuntimeException("File not found...") }