bgawrych commented 3 years ago

Description

When I tried to run benchmarks only with CPU I got NaNs in results:

2020-10-20 14:11:39,221 - benchmark_utils - INFO - 1 / 1
2020-10-20 14:11:44,714 - benchmark_utils - INFO - Can't pickle local object 'measure_peak_memory_cpu.<locals>.MemoryMeasureProcess'
2020-10-20 14:11:44,732 - benchmark_utils - INFO - Saving results to csv train_fp32_NT_NT/google_en_uncased_bert_base_4_128.csv.
2020-10-20 14:11:44,733 - benchmark_utils - INFO - 
====================            TRAIN - RESULT - SPEED - RESULTS           ====================
2020-10-20 14:11:44,733 - benchmark_utils - INFO - -----------------------------------------------------------------------------------------------
2020-10-20 14:11:44,734 - benchmark_utils - INFO -           Model Name             Batch Size     Seq Length    Latency (ms)      Memory    
2020-10-20 14:11:44,734 - benchmark_utils - INFO - -----------------------------------------------------------------------------------------------
2020-10-20 14:11:44,734 - benchmark_utils - INFO -  google_en_uncased_bert_base         4             128            nan            nan      
2020-10-20 14:11:44,734 - benchmark_utils - INFO - -----------------------------------------------------------------------------------------------
2020-10-20 14:11:44,734 - benchmark_utils - INFO - 
====================        ENVIRONMENT INFORMATION         ====================
2020-10-20 14:11:44,777 - benchmark_utils - INFO - - gluonnlp_version: 1.0.0.dev
- framework_version: 2.0.0
- python_version: 3.8.3
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-10-20
- time: 14:11:44.775970
- fp16: False
- cpu_ram_mb: 191906
- use_gpu: False

The reason was that MemoryMeasureProcess class was defined inside function. This PR moves MemoryMeasureProcess class and _get_cpumemory function outside of _measure_peak_memorycpu function what solves the problem.

Checklist

Essentials

[X] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
[X] Changes are complete (i.e. I finished coding on this PR)

cc @dmlc/gluon-nlp-team @sxjscience

sxjscience commented 3 years ago

LGTM. Thanks for the fix!

github-actions[bot] commented 3 years ago

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1392/benchmark_fix/index.html

codecov[bot] commented 3 years ago

Codecov Report

Merging #1392 into master will increase coverage by 0.11%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1392      +/-   ##
==========================================
+ Coverage   85.12%   85.24%   +0.11%     
==========================================
  Files          53       53              
  Lines        6959     6959              
==========================================
+ Hits         5924     5932       +8     
+ Misses       1035     1027       -8

Impacted Files	Coverage Δ
src/gluonnlp/data/filtering.py	`78.26% <0.00%> (-4.35%)`	:arrow_down:
src/gluonnlp/data/tokenizers/yttm.py	`81.89% <0.00%> (-0.87%)`	:arrow_down:
src/gluonnlp/data/loading.py	`83.39% <0.00%> (+5.28%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8ef4b26...deb6ca9. Read the comment docs.

dmlc / gluon-nlp

[Fix] Gluon-NLP benchmark script fix #1392

Description

Checklist

Essentials

Codecov Report