Windows 10 - Githubissues

Pascal66 commented 4 years ago

Is your feature request related to a problem? Please describe. I'm always frustrated when I cant try pytorch/cuda/python things under windows

Describe the solution you'd like what you want to happen : smooth install

Describe alternatives you've considered any alternative solutions conda or wsl ubuntu working way

Additional context wsl ubuntu windows 10 has problem with cuda conda windows 10 has problem with vc14++ compilation allmost on torchsearchsorted

MultiPath commented 4 years ago

Hi, we have not tested on Windows 10 yet. Our code is based on cuda, so it is not possible to use wsl1. Recently wsl2 supports cuda now, but I have never successfully tested it before.

Our code now does not rely on torchsearchsorted now. Maybe you can delete the requirement for that and try again for installing on windows?

Pascal66 commented 4 years ago

Perfect, I'll try without torchsearchsorted ! (Not using wsl1 or wsl2) just anaconda

For now, after removing torchsearchsorted, install requirements.txt work After modified setup.py with backslash and hardcoded path it work.

Unfortunatly, fairrn NEED torchsearchsorted :

>>> import fairnr.clib
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Pascal\git\NSVF\fairnr\__init__.py", line 11, in <module>
    from . import data, tasks, models, modules, criterions
  File "C:\Users\Pascal\git\NSVF\fairnr\models\__init__.py", line 15, in <module>
    module = importlib.import_module('fairnr.models.' + model_name)
  File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\Pascal\git\NSVF\fairnr\models\fairnr_model.py", line 22, in <module>
    from fairnr.modules.encoder import get_encoder
  File "C:\Users\Pascal\git\NSVF\fairnr\modules\__init__.py", line 15, in <module>
    module = importlib.import_module('fairnr.modules.' + model_name)
  File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\Pascal\git\NSVF\fairnr\modules\renderer.py", line 15, in <module>
    from torchsearchsorted import searchsorted
ModuleNotFoundError: No module named 'torchsearchsorted'
>>>

Pascal66 commented 4 years ago

Modified \fairnr\modules\renderer.py instead searchsorted, using torch.searchsorted

Pascal66 commented 4 years ago

Another problem :

2020-10-26 08:15:34 | INFO | fairnr_cli.train | model nsvf_base, criterion SRNLossCriterion
2020-10-26 08:15:34 | INFO | fairnr_cli.train | num. model params: 582737 (num. trained: 582724)
2020-10-26 08:15:37 | INFO | fairseq.utils | ***********************CUDA enviroments for all 1 workers***********************
2020-10-26 08:15:37 | INFO | fairseq.utils | rank   0: capabilities =  7.5  ; total memory = 6.000 GB ; name = GeForce RTX 2060                
2020-10-26 08:15:37 | INFO | fairseq.utils | ***********************CUDA enviroments for all 1 workers***********************
2020-10-26 08:15:37 | INFO | fairnr_cli.train | training on 1 GPUs
2020-10-26 08:15:37 | INFO | fairnr_cli.train | max tokens per GPU = None and max sentences per GPU = 1
2020-10-26 08:15:37 | INFO | fairseq.trainer | no existing checkpoint found checkpoint\Wineholder\nsvf_basev1\checkpoint_last.pt
2020-10-26 08:15:37 | INFO | fairseq.trainer | loading train data for epoch 1
Traceback (most recent call last):
  File "train.py", line 20, in <module>
    cli_main()
  File "C:\Users\Pascal\git\NSVF\fairnr_cli\train.py", line 373, in cli_main
    main(args)
  File "C:\Users\Pascal\git\NSVF\fairnr_cli\train.py", line 91, in main
    extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
  File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\checkpoint_utils.py", line 158, in load_checkpoint
    epoch_itr = trainer.get_train_iterator(
  File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\trainer.py", line 335, in get_train_iterator
    return self.task.get_batch_iterator(
  File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\tasks\fairseq_task.py", line 180, in get_batch_iterator
    batch_sampler = dataset.batch_by_size(
  File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\data\base_wrapper_dataset.py", line 59, in batch_by_size
    return self.dataset.batch_by_size(
  File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\data\fairseq_dataset.py", line 116, in batch_by_size
    return data_utils.batch_by_size(
  File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\data\data_utils.py", line 249, in batch_by_size
    return batch_by_size_fast(
  File "fairseq\data\data_utils_fast.pyx", line 27, in fairseq.data.data_utils_fast.batch_by_size_fast
ValueError: Buffer dtype mismatch, expected 'DTYPE_t' but got 'long'

MultiPath commented 4 years ago

Do you have scripts to reproduce this error?

Pascal66 commented 4 years ago

It's the same as the original bash script (nsvf38) C:\Users\Pascal\git\NSVF>python train.py C:\Users\Pascal\git\NSVF\Synthetic_NSVF\Wineholder --user-dir fairnr --task single_object_rendering --train-views "0..100" --view-resolution 800x800 --max-sentences 1 --view-per-batch 2 --pixel-per-view 2048 --no-preload --sampling-on-mask 1.0 --no-sampling-at-reader --valid-view-resolution 800x800 --valid-views "100..200" --valid-view-per-batch 1 --transparent-background "1.0,1.0,1.0" --background-stop-gradient --arch nsvf_base --initial-boundingbox C:\Users\Pascal\git\NSVF\Synthetic_NSVF\Wineholder\bbox.txt --raymarching-stepsize-ratio 0.125 --use-octree --discrete-regularization --color-weight 128.0 --alpha-weight 1.0 --optimizer "adam" --adam-betas "(0.9, 0.999)" --lr-scheduler "polynomial_decay" --total-num-update 150000 --lr 0.001 --clip-norm 0.0 --criterion "srn_loss" --num-workers 0 --seed 2 --save-interval-updates 500 --max-update 150000 --virtual-epoch-steps 5000 --save-interval 1 --half-voxel-size-at "5000,25000,75000" --reduce-step-size-at "5000,25000,75000" --pruning-every-steps 2500 --keep-interval-updates 5 --log-format simple --log-interval 1 --tensorboard-logdir checkpoint\Wineholder\tensorboard\nsvf_basev1 --save-dir checkpoint\Wineholder\nsvf_basev1

Pascal66 commented 4 years ago

Wich give the call : 2020-10-26 17:11:18 | INFO | fairnr_cli.train | Namespace(L1=False, adam_betas='(0.9, 0.999)', adam_eps=1e-08, all_gather_list_size=16384, alpha_weight=1.0, arch='nsvf_base', background_depth=5.0, background_stop_gradient=True, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_suffix='', chunk_size=64, clip_norm=0.0, color_weight=128.0, cpu=False, criterion='srn_loss', curriculum=0, data='C:\\Users\\Pascal\\git\\NSVF\\Synthetic_NSVF\\Wineholder', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', density_embed_dim=128, depth_weight=0.0, depth_weight_decay=None, deterministic_step=False, device_id=0, disable_validation=False, discrete_regularization=True, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', empty_cache_freq=0, end_learning_rate=0.0, eval_lpips=False, fast_stat_sync=False, feature_embed_dim=256, feature_layers=1, find_unused_parameters=False, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, half_voxel_size_at='5000,25000,75000', initial_boundingbox='C:\\Users\\Pascal\\git\\NSVF\\Synthetic_NSVF\\Wineholder\\bbox.txt', inputs_to_density='emb:6:32', inputs_to_texture='feat:0:256, ray:4', keep_best_checkpoints=-1, keep_interval_updates=5, keep_last_epochs=-1, load_depth=False, load_mask=False, localsgd_frequency=3, log_format='simple', log_interval=1, lr=[0.001], lr_scheduler='polynomial_decay', max_epoch=0, max_hits=60, max_sentences=1, max_sentences_valid=1, max_tokens=None, max_tokens_valid=None, max_update=150000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_color=-1, min_loss_scale=0.0001, min_lr=-1, model_parallel_size=1, no_background_loss=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_load_binary=False, no_preload=True, no_progress_bar=False, no_sampling_at_reader=True, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=1, num_workers=0, object_id_path=None, optimizer='adam', optimizer_overrides='{}', output_valid=None, patience=-1, pixel_per_view=2048.0, power=1.0, profile=False, pruning_every_steps=2500, pruning_rerun_train_set=False, pruning_th=0.5, pruning_with_train_stats=False, quantization_config_path=None, raymarching_stepsize=0.01, raymarching_stepsize_ratio=0.125, raymarching_tolerance=0, reduce_step_size_at='5000,25000,75000', rendering_args=None, rendering_every_steps=None, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sampling_at_center=1.0, sampling_on_bbox=False, sampling_on_mask=1.0, sampling_patch_size=1, sampling_skipping_size=1, save_dir='checkpoint\\Wineholder\\nsvf_basev1', save_interval=1, save_interval_updates=500, scoring='bleu', seed=2, sentence_avg=False, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, subsample_valid=-1, task='single_object_rendering', tensorboard_logdir='checkpoint\\Wineholder\\tensorboard\\nsvf_basev1', test_views='0', texture_embed_dim=256, texture_layers=3, threshold_loss_scale=None, tokenizer=None, total_num_update=150000, tpu=False, train_subset='train', train_views='0..100', transparent_background='1.0,1.0,1.0', update_freq=[1], use_bmuf=False, use_octree=True, use_old_adam=False, user_dir='fairnr', valid_chunk_size=64, valid_subset='valid', valid_view_per_batch=1, valid_view_resolution='800x800', valid_views='100..200', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, vgg_level=2, vgg_weight=0.0, view_per_batch=2, view_resolution='800x800', virtual_epoch_steps=5000, voxel_embed_dim=32, voxel_path=None, voxel_size=0.25, warmup_updates=0, weight_decay=0.0)

MultiPath commented 4 years ago

It looks like raising errors from fairseq https://github.com/pytorch/fairseq Do you install it based on what I put in the requirements.txt? I did not use the up-to-date version to avoid big code changes.

Pascal66 commented 4 years ago

It's the fairseq 0.9.0 from your line in requirements.txt

There is an issue like that in fairseq https://github.com/pytorch/fairseq/issues/2483

"The issue is as expected: default (non-portable) dtypes in FairseqDataset#ordered_indices"

I'll try the patch

mwalczyk commented 4 years ago

@Pascal66 were you able to fix that last issue (regarding dtypes) with the aforementioned patch? I'm still getting the same error here

Pascal66 commented 4 years ago

@mwalczyk Unfortunatly no, as @MultiPath say, he didnt use the uptodate version of fairseq, so I dont know well where to apply the patch

facebookresearch / NSVF

Windows 10 #22