google-research / exoplanet-ml

Machine learning models and utilities for exoplanet science.
Apache License 2.0
296 stars 117 forks source link

Bazel Build Issues #6

Open rathjo14 opened 5 years ago

rathjo14 commented 5 years ago

Following the AstroNet readme as much as possible I have been running into some major problems in the Bazel building phase.

Bazel Version: 0.24.1 TensorFlow Version: 1.14.0 When running: bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/...

ERROR: /private/var/tmp/_bazel_rathjo14/d5d70ed4975039d87f5635d66a43ed87/external/com_google_protobuf/protobuf_deps.bzl:18:9: no such package '': BUILD file not found in any of the following directories.

Looking into the file mentioned in the error here is what I see (lines 17:23):

if not native.existing_rule("six"):
    http_archive(
        name = "six",
        build_file = "@//:six.BUILD",
        sha256 = "105f8d68616f8248e24bf0e9372ef04d3cc10104f1980f54d57b2ce73a5ad56a",
        urls = ["https://pypi.python.org/packages/source/s/six/six-1.10.0.tar.gz#md5=34eed507548117b2ab523ab14b2f8b55"],
    )
adi-panda commented 5 years ago

Hello, Did you find a solution to the issue?

jalalirs commented 4 years ago

I am facing the same problem. Any solution?

jalalirs commented 4 years ago

Ok. I am not familiar with Bazel syntax at all, but after a long hustle and long searching and reading, the following solved the problem

Modify the last part of the BUILD file in the light_curve directory:

load("@com_google_protobuf//:protobuf.bzl", "py_proto_library") py_proto_library( name = "light_curve_py_pb2", srcs_version = "PY2AND3", srcs = glob(["proto/*.proto"]), deps = [ "@com_google_protobuf//:protobuf_python", ], )

Also in the WORKSPACE file, I updated the ProtoBuf library at the end of the file

http_archive( name = "com_google_protobuf", sha256 = "60d2012e3922e429294d3a4ac31f336016514a91e5a63fd33f35743ccfe1bd7d", strip_prefix = "protobuf-3.11.0", urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.11.0.zip"], ) load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")

protobuf_deps()

ritwik12 commented 4 years ago

@jalalirs Above solution worked for py_proto_library but now this gives error for proto_library saying no such attribute 'cc_api_version' in 'proto_library' rule Did anyone faced this?

jalalirs commented 4 years ago

@jalalirs Above solution worked for py_proto_library but now this gives error for proto_library saying no such attribute 'cc_api_version' in 'proto_library' rule Did anyone faced this?

Just remove cc_api_version

ritwik12 commented 4 years ago

@jalalirs I did. then it gave numerous other errors.

//astronet/astro_cnn_model:astro_cnn_model_test                          FAILED in 6.2s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_cnn_model/astro_cnn_model_test/test.log
//astronet/astro_fc_model:astro_fc_model_test                            FAILED in 6.1s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_fc_model/astro_fc_model_test/test.log
//astronet/astro_model:astro_model_test                                  FAILED in 6.1s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/astro_model/astro_model_test/test.log
//astronet/ops:dataset_ops_test                                          FAILED in 6.2s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/dataset_ops_test/test.log
//astronet/ops:input_ops_test                                            FAILED in 2.9s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/input_ops_test/test.log
//astronet/ops:metrics_test                                              FAILED in 6.1s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astronet/ops/metrics_test/test.log
//astrowavenet:astrowavenet_model_test                                   FAILED in 6.1s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astrowavenet/astrowavenet_model_test/test.log
//astrowavenet/data:base_test                                            FAILED in 6.2s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/astrowavenet/data/base_test/test.log
//light_curve:kepler_io_test                                             FAILED in 6.2s
  /private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/execroot/__main__/bazel-out/darwin-fastbuild/testlogs/light_curve/kepler_io_test/test.log

Executed 9 out of 23 tests: 14 tests pass and 9 fail locally.
There were tests whose specified size is too big. Use the --test_verbose_timeoutINFO: Build completed, 9 tests FAILED, 10 total actions
zoe4cs commented 4 years ago

@jalalirs Above solution worked for py_proto_library but now this gives error for proto_library saying no such attribute 'cc_api_version' in 'proto_library' rule Did anyone faced this?

I am facing the same problem, what versions of the packages you are using?

ritwik12 commented 4 years ago

@zoe4cs bazel 2.0.0

ritwik12 commented 4 years ago

@zoe4cs Any luck here?

jalalirs commented 4 years ago

I will fork the project tonight and commit my changes. I don’t remember all the modifications I made but lets see if my version works with you. Wait for my reply

ritwik12 commented 4 years ago

@jalalirs Ohk sure, thanks :)

zoe4cs commented 4 years ago

@zoe4cs Any luck here?

I guess versions of bazel and TensorFlow causing problem, but I haven't find a solution .

jalalirs commented 4 years ago

So here is what I did to make it run.

First, I ran it over a tensorflow image from docker hub. I used this tag 2.0.1-gpu-py3-jupyter

https://hub.docker.com/r/tensorflow/tensorflow

In the container, I installed bazel, cloned this repository and did the following modifications

Modify the last part of the BUILD file in the light_curve directory:

load("@com_google_protobuf//:protobuf.bzl", "py_proto_library") py_proto_library( name = "light_curve_py_pb2", srcs_version = "PY2AND3", srcs = glob(["proto/*.proto"]), deps = [ "@com_google_protobuf//:protobuf_python", ], )

Also in the WORKSPACE file, I updated the ProtoBuf library at the end of the file

http_archive( name = "com_google_protobuf", sha256 = "60d2012e3922e429294d3a4ac31f336016514a91e5a63fd33f35743ccfe1bd7d", strip_prefix = "protobuf-3.11.0", urls = ["https://github.com/protocolbuffers/protobuf/archive/v3.11.0.zip"], ) load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps") protobuf_deps()

I ran the test with the following command

_bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/... --test_arg=--testsrcdir=/home/exoplanet-ml/exoplanet-ml/

https://pbs.twimg.com/media/EOGoWSOXUAUy0Yj?format=jpg&name=large

ritwik12 commented 4 years ago

@jalalirs They were all version issues. tensorflow and tensorflow_probability. Workin versions:

tensorboard            1.13.1    
tensorflow             1.13.2    
tensorflow-estimator   1.13.0    
tensorflow-probability 0.6.0 

Still two test cases are failing as below. Don't know why. From logs I can see -

======================================================================
ERROR: testBadLabelIdsRaisesValueError (__main__.BuildDatasetTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/private/var/tmp/_bazel_ritsharm/6c31e64f0da40b5f15aa6c8979a9a35d/sandbox/darwin-sandbox/91/execroot/__main__/bazel-out/darwin-fastbuild/bin/astronet/ops/dataset_ops_test.runfiles/__main__/astronet/ops/dataset_ops_test.py", line 231, in setUp
    self._file_pattern = os.path.join(FLAGS.test_srcdir, _TEST_TFRECORD_FILE)
  File "/Users/ritsharm/git/google-research/lib/python3.7/site-packages/absl/flags/_flagvalues.py", line 473, in __getattr__
    raise AttributeError(name)
AttributeError: test_srcdir
jalalirs commented 4 years ago

You need to pass the data source by adding the following parameter to the run command

--test_arg=--test_srcdir=

ritwik12 commented 4 years ago

@jalalirs Thanks a lot for that but still after using

bazel test astronet/... astrowavenet/... light_curve/... tf_util/... third_party/... --test_arg=--test_srcdir=/Users/ritsharm/git/exoplanet-ml/exoplanet-ml/

It gives errors as

usage: astro_cnn_model_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b]
                               [-k TESTNAMEPATTERNS]
                               [tests [tests ...]]
astro_cnn_model_test.py: error: unrecognized arguments: --test_srcdir=/Users/ritsharm/git/exoplanet-ml/exoplanet-ml
jalalirs commented 4 years ago

Probably you need tensorflow 2

ritwik12 commented 4 years ago

@jalalirs But with TensorFlow 2 lots of other things are breaking :(

ritwik12 commented 4 years ago

@jalalirs Tensorflow 2.0 is not supported as this project code uses.

tf.contrib.data.parallel_interleave(
AttributeError: module 'tensorflow' has no attribute 'contrib'

and tf.contrib is deprecated in tf 2.

Can you please check which version of tensorflow are you using?

jalalirs commented 4 years ago

You are actually right, I am using 1.15 import tensorflow as tf tf.__version__ '1.15.0'

ritwik12 commented 4 years ago

@jalalirs

I got it correct. It was all version issues.

tensorboard            1.15.0    
tensorflow             1.15.0    
tensorflow-estimator   1.15.1    
tensorflow-probability 0.8.0  

Above versions passes all tests

ritwik12 commented 4 years ago

@jalalirs Did the steps worked for you till the end as mentioned in this

For me it is giving lots of exceptions in Prediction step which is the last step:

# Generate a prediction for a new TCE.
bazel-bin/astronet/predict \
  --model=AstroCNNModel \
  --config_name=local_global \
  --model_dir=${MODEL_DIR} \
  --kepler_data_dir=${KEPLER_DATA_DIR} \
  --kepler_id=11442793 \
  --period=14.44912 \
  --t0=2.2 \
  --duration=0.11267 \
  --output_image_file="${HOME}/astronet/kepler-90i.png"

is there any code change?

jalalirs commented 4 years ago

@ritwik12 no I just ran the test command. After that I started using some of the modules directly. I am working on it intermittently, so I didn’t do any training yet.

I am an amateur in the astronomy field and just starting to get my hand dirty with its data. Yet, for this specific project, I am planning to skip all the bazel thing and build the code using direct python calls.

ritwik12 commented 4 years ago

Ohk got it. Thanks a lot :) @jalalirs

muHashh commented 3 years ago

leaving a modified version here for people who happen to stumble upon this thread. I've linked the docker image at the top of the readme that I used to get it to work with my AMD Vega 56 and ROCm. Make sure to also follow the ROCm docker install guide If you have issues with rocm-dkms installing, switch to and older kernel version. I was running 5.8 (on Ubuntu 20 LTS which is the recommended distro) and installing 5.6 fixed the issue.