NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
751 stars 113 forks source link

[INF]Documentation improvement #867

Open viswa-nvidia opened 1 year ago

viswa-nvidia commented 1 year ago

Description

We want to focus on improving our documentations.

We brainstormed / collected following ideas (not prioritised, yet):

General

quick start example documentation

How to write an operator

Inline Documentation (Coverage)

Docstring Coverage (March 28th):

Some previous attempts: https://github.com/NVIDIA-Merlin/Merlin/issues/788 https://github.com/NVIDIA-Merlin/Merlin/issues/795 https://github.com/NVIDIA-Merlin/Merlin/issues/794

bschifferer commented 1 year ago

Docstring Coverage (March 28th): Merlin Models: 40% Transformers4Rec: 41% Merlin Systems: 80% Merlin Core: 80% DataLoader: 78%

Merlin Models: ============================ Coverage for /workspace/01_MerlinDev/62_DocStrings/models/merlin/ ============================ --------------------------------------------------------- Summary --------------------------------------------------------- Name Total Miss Cover Cover%
datasets/synthetic.py 4 1 3 75%
datasets/advertising/criteo/dataset.py 5 3 2 40%
datasets/ecommerce/aliccp/dataset.py 4 2 2 50%
datasets/ecommerce/booking/dataset.py 5 1 4 80%
datasets/ecommerce/dressipi/dataset.py 3 2 1 33%
datasets/entertainment/movielens/dataset.py 7 4 3 43%
models/config/schema.py 11 6 5 45%
models/tf/loader.py 5 2 3 60%
models/tf/blocks/cross.py 7 5 2 29%
models/tf/blocks/dlrm.py 2 1 1 50%
models/tf/blocks/experts.py 19 12 7 37%
models/tf/blocks/interaction.py 13 8 5 38%
models/tf/blocks/mlp.py 8 6 2 25%
models/tf/blocks/optimizer.py 12 4 8 67%
models/tf/blocks/retrieval/base.py 15 9 6 40%
models/tf/blocks/retrieval/matrix_factorization.py 5 3 2 40%
models/tf/blocks/sampling/base.py 4 4 0 0%
models/tf/blocks/sampling/cross_batch.py 6 4 2 33%
models/tf/blocks/sampling/in_batch.py 7 5 2 29%
models/tf/core/aggregation.py 39 29 10 26%
models/tf/core/base.py 39 27 12 31%
models/tf/core/combinators.py 40 21 19 48%
models/tf/core/encoder.py 20 9 11 55%
models/tf/core/index.py 14 9 5 36%
models/tf/core/prediction.py 9 5 4 44%
models/tf/core/tabular.py 36 27 9 25%
models/tf/distributed/embedding.py 5 2 3 60%
models/tf/experimental/sample_weight.py 5 3 2 40%
models/tf/inputs/continuous.py 9 6 3 33%
models/tf/inputs/embedding.py 46 32 14 30%
models/tf/losses/base.py 1 1 0 0%
models/tf/metrics/evaluation.py 19 12 7 37%
models/tf/metrics/topk.py 25 15 10 40%
models/tf/models/base.py 62 34 28 45%
models/tf/models/utils.py 2 2 0 0%
models/tf/outputs/base.py 12 9 3 25%
models/tf/outputs/block.py 5 3 2 40%
models/tf/outputs/classification.py 15 10 5 33%
models/tf/outputs/contrastive.py 13 9 4 31%
models/tf/outputs/topk.py 13 5 8 62%
models/tf/outputs/sampling/base.py 10 7 3 30%
models/tf/outputs/sampling/in_batch.py 7 5 2 29%
models/tf/outputs/sampling/popularity.py 5 3 2 40%
models/tf/prediction_tasks/base.py 22 16 6 27%
models/tf/prediction_tasks/classification.py 12 8 4 33%
models/tf/prediction_tasks/next_item.py 6 3 3 50%
models/tf/prediction_tasks/regression.py 5 3 2 40%
models/tf/prediction_tasks/retrieval.py 5 4 1 20%
models/tf/transformers/block.py 16 7 9 56%
models/tf/transformers/transforms.py 23 14 9 39%
models/tf/transforms/bias.py 17 13 4 24%
models/tf/transforms/features.py 50 36 14 28%
models/tf/transforms/noise.py 5 4 1 20%
models/tf/transforms/regularization.py 3 2 1 33%
models/tf/transforms/sequence.py 48 23 25 52%
models/tf/transforms/tensor.py 5 4 1 20%
models/tf/utils/batch_utils.py 8 5 3 38%
models/tf/utils/repr_utils.py 5 5 0 0%
models/tf/utils/search_utils.py 3 3 0 0%
models/tf/utils/testing_utils.py 11 7 4 36%
models/tf/utils/tf_utils.py 24 15 9 38%
models/torch/losses.py 2 1 1 50%
models/torch/block/base.py 19 14 5 26%
models/torch/block/mlp.py 4 4 0 0%
models/torch/features/base.py 1 1 0 0%
models/torch/features/continuous.py 4 3 1 25%
models/torch/features/embedding.py 15 10 5 33%
models/torch/features/tabular.py 4 1 3 75%
models/torch/model/base.py 32 24 8 25%
models/torch/model/prediction_task.py 6 6 0 0%
models/torch/tabular/aggregation.py 13 9 4 31%
models/torch/tabular/base.py 29 15 14 48%
models/torch/tabular/transformations.py 9 7 2 22%
models/torch/utils/data_utils.py 14 9 5 36%
models/torch/utils/torch_utils.py 20 14 6 30%
models/utils/dataset.py 7 4 3 43%
models/utils/dependencies.py 4 4 0 0%
models/utils/doc_utils.py 1 1 0 0%
models/utils/misc_utils.py 9 4 5 56%
models/utils/nvt_utils.py 1 1 0 0%
models/utils/registry.py 19 12 7 37%
models/utils/schema_utils.py 12 9 3 25%
------------------------------------------------------------- -------------- ------------- -------------- ---------------
TOTAL 1172 692 480 41.0%
-------------------------------------------------------------------------------------------------------------------------
(16 of 98 files omitted due to complete coverage)
Transformers4Rec: ========================== Coverage for /workspace/01_MerlinDev/62_DocStrings/Transformers4Rec/ =========================== --------------------------------------------------------- Summary --------------------------------------------------------- Name Total Miss Cover Cover%
merlin_standard_lib/proto/schema_bp.py 47 8 39 83%
merlin_standard_lib/schema/schema.py 30 29 1 3%
merlin_standard_lib/utils/embedding_utils.py 2 2 0 0%
transformers4rec/config/schema.py 6 6 0 0%
transformers4rec/config/transformer.py 22 22 0 0%
transformers4rec/data/dataset.py 4 4 0 0%
transformers4rec/torch/experimental.py 4 3 1 25%
transformers4rec/torch/losses.py 2 1 1 50%
transformers4rec/torch/masking.py 19 8 11 58%
transformers4rec/torch/ranking_metric.py 9 8 1 11%
transformers4rec/torch/trainer.py 19 6 13 68%
transformers4rec/torch/block/base.py 19 19 0 0%
transformers4rec/torch/block/mlp.py 4 4 0 0%
transformers4rec/torch/block/transformer.py 8 5 3 38%
transformers4rec/torch/features/base.py 1 1 0 0%
transformers4rec/torch/features/continuous.py 4 3 1 25%
transformers4rec/torch/features/embedding.py 17 11 6 35%
transformers4rec/torch/features/sequence.py 9 6 3 33%
transformers4rec/torch/features/tabular.py 4 1 3 75%
transformers4rec/torch/model/base.py 31 20 11 35%
transformers4rec/torch/model/prediction_task.py 14 11 3 21%
transformers4rec/torch/tabular/aggregation.py 13 9 4 31%
transformers4rec/torch/tabular/base.py 29 15 14 48%
transformers4rec/torch/tabular/transformations.py 12 9 3 25%
transformers4rec/torch/utils/data_utils.py 14 8 6 43%
transformers4rec/torch/utils/examples_utils.py 4 1 3 75%
transformers4rec/torch/utils/schema_utils.py 1 1 0 0%
transformers4rec/torch/utils/torch_utils.py 29 16 13 45%
transformers4rec/utils/data_utils.py 4 2 2 50%
transformers4rec/utils/dependencies.py 3 3 0 0%
------------------------------------------------------------- -------------- ------------- -------------- ---------------
TOTAL 413 242 171 41.4%
-------------------------------------------------------------------------------------------------------------------------
Merlin Systems: ======================= Coverage for /workspace/01_MerlinDev/62_DocStrings/systems/merlin/systems/ ======================== --------------------------------------------------------- Summary --------------------------------------------------------- Name Total Miss Cover Cover%
model_registry.py 4 2 2 50%
dag/ops/faiss.py 8 3 5 62%
dag/ops/feast.py 6 1 5 83%
dag/ops/fil.py 27 7 20 74%
dag/ops/implicit.py 6 2 4 67%
dag/ops/pytorch.py 4 1 3 75%
dag/ops/session_filter.py 5 1 4 80%
dag/ops/softmax_sampling.py 4 1 3 75%
dag/ops/tensorflow.py 5 1 4 80%
dag/ops/unroll_features.py 3 2 1 33%
dag/ops/workflow.py 4 1 3 75%
dag/runtimes/triton/ops/fil.py 11 1 10 91%
dag/runtimes/triton/ops/operator.py 4 1 3 75%
dag/runtimes/triton/ops/pytorch.py 6 1 5 83%
triton/utils.py 5 2 3 60%
triton/models/pytorch_model.py 3 1 2 67%
workflow/base.py 3 3 0 0%
workflow/hugectr.py 2 2 0 0%
workflow/pytorch.py 1 1 0 0%
workflow/tensorflow.py 1 1 0 0%
------------------------------------------------- ----------------- ---------------- ----------------- ------------------
TOTAL 176 35 141 80.1%
-------------------------------------------------------------------------------------------------------------------------
Merlin Core: ============================= Coverage for /workspace/01_MerlinDev/62_DocStrings/core/merlin/ ============================= --------------------------------------------------------- Summary --------------------------------------------------------- Name Total Miss Cover Cover%
dag/base_operator.py 11 2 9 82%
dag/graph.py 5 2 3 60%
dag/node.py 13 4 9 69%
dtypes/shape.py 8 6 2 25%
schema/schema.py 22 5 17 77%
schema/io/schema_bp.py 51 11 40 78%
table/conversions.py 4 4 0 0%
table/cupy_column.py 6 1 5 83%
table/numpy_column.py 6 1 5 83%
table/tensor_column.py 7 1 6 86%
table/tensor_table.py 12 3 9 75%
table/tensorflow_column.py 8 3 5 62%
table/torch_column.py 6 1 5 83%
----------------------------------------- ------------------- ------------------ ------------------- --------------------
TOTAL 232 44 188 81.0%
-------------------------------------------------------------------------------------------------------------------------
DataLoader: ========================== Coverage for /workspace/01_MerlinDev/62_DocStrings/dataloader/merlin/ ========================== --------------------------------------------------------- Summary --------------------------------------------------------- Name Total Miss Cover Cover%
dataloader/loader_base.py 14 7 7 50%
dataloader/tensorflow.py 5 1 4 80%
dataloader/ops/embeddings/embedding_op.py 7 2 5 71%
dataloader/utils/tf/tf_trainer.py 2 1 1 50%
dataloader/utils/torch/torch_trainer_dist.py 4 3 1 25%
--------------------------------------------------------- --------------- -------------- --------------- ----------------
TOTAL 66 14 52 78.8%
-------------------------------------------------------------------------------------------------------------------------