Closed Pascal66 closed 4 years ago
Hi, we have not tested on Windows 10 yet. Our code is based on cuda, so it is not possible to use wsl1. Recently wsl2 supports cuda now, but I have never successfully tested it before.
Our code now does not rely on torchsearchsorted now. Maybe you can delete the requirement for that and try again for installing on windows?
Perfect, I'll try without torchsearchsorted ! (Not using wsl1 or wsl2) just anaconda
For now, after removing torchsearchsorted, install requirements.txt work After modified setup.py with backslash and hardcoded path it work.
Unfortunatly, fairrn NEED torchsearchsorted :
>>> import fairnr.clib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Pascal\git\NSVF\fairnr\__init__.py", line 11, in <module>
from . import data, tasks, models, modules, criterions
File "C:\Users\Pascal\git\NSVF\fairnr\models\__init__.py", line 15, in <module>
module = importlib.import_module('fairnr.models.' + model_name)
File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Users\Pascal\git\NSVF\fairnr\models\fairnr_model.py", line 22, in <module>
from fairnr.modules.encoder import get_encoder
File "C:\Users\Pascal\git\NSVF\fairnr\modules\__init__.py", line 15, in <module>
module = importlib.import_module('fairnr.modules.' + model_name)
File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\importlib\__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Users\Pascal\git\NSVF\fairnr\modules\renderer.py", line 15, in <module>
from torchsearchsorted import searchsorted
ModuleNotFoundError: No module named 'torchsearchsorted'
>>>
Modified \fairnr\modules\renderer.py instead searchsorted, using torch.searchsorted
Another problem :
2020-10-26 08:15:34 | INFO | fairnr_cli.train | model nsvf_base, criterion SRNLossCriterion
2020-10-26 08:15:34 | INFO | fairnr_cli.train | num. model params: 582737 (num. trained: 582724)
2020-10-26 08:15:37 | INFO | fairseq.utils | ***********************CUDA enviroments for all 1 workers***********************
2020-10-26 08:15:37 | INFO | fairseq.utils | rank 0: capabilities = 7.5 ; total memory = 6.000 GB ; name = GeForce RTX 2060
2020-10-26 08:15:37 | INFO | fairseq.utils | ***********************CUDA enviroments for all 1 workers***********************
2020-10-26 08:15:37 | INFO | fairnr_cli.train | training on 1 GPUs
2020-10-26 08:15:37 | INFO | fairnr_cli.train | max tokens per GPU = None and max sentences per GPU = 1
2020-10-26 08:15:37 | INFO | fairseq.trainer | no existing checkpoint found checkpoint\Wineholder\nsvf_basev1\checkpoint_last.pt
2020-10-26 08:15:37 | INFO | fairseq.trainer | loading train data for epoch 1
Traceback (most recent call last):
File "train.py", line 20, in <module>
cli_main()
File "C:\Users\Pascal\git\NSVF\fairnr_cli\train.py", line 373, in cli_main
main(args)
File "C:\Users\Pascal\git\NSVF\fairnr_cli\train.py", line 91, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\checkpoint_utils.py", line 158, in load_checkpoint
epoch_itr = trainer.get_train_iterator(
File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\trainer.py", line 335, in get_train_iterator
return self.task.get_batch_iterator(
File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\tasks\fairseq_task.py", line 180, in get_batch_iterator
batch_sampler = dataset.batch_by_size(
File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\data\base_wrapper_dataset.py", line 59, in batch_by_size
return self.dataset.batch_by_size(
File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\data\fairseq_dataset.py", line 116, in batch_by_size
return data_utils.batch_by_size(
File "C:\Users\Pascal\anaconda202007\envs\nsvf38\lib\site-packages\fairseq\data\data_utils.py", line 249, in batch_by_size
return batch_by_size_fast(
File "fairseq\data\data_utils_fast.pyx", line 27, in fairseq.data.data_utils_fast.batch_by_size_fast
ValueError: Buffer dtype mismatch, expected 'DTYPE_t' but got 'long'
Do you have scripts to reproduce this error?
It's the same as the original bash script
(nsvf38) C:\Users\Pascal\git\NSVF>python train.py C:\Users\Pascal\git\NSVF\Synthetic_NSVF\Wineholder --user-dir fairnr --task single_object_rendering --train-views "0..100" --view-resolution 800x800 --max-sentences 1 --view-per-batch 2 --pixel-per-view 2048 --no-preload --sampling-on-mask 1.0 --no-sampling-at-reader --valid-view-resolution 800x800 --valid-views "100..200" --valid-view-per-batch 1 --transparent-background "1.0,1.0,1.0" --background-stop-gradient --arch nsvf_base --initial-boundingbox C:\Users\Pascal\git\NSVF\Synthetic_NSVF\Wineholder\bbox.txt --raymarching-stepsize-ratio 0.125 --use-octree --discrete-regularization --color-weight 128.0 --alpha-weight 1.0 --optimizer "adam" --adam-betas "(0.9, 0.999)" --lr-scheduler "polynomial_decay" --total-num-update 150000 --lr 0.001 --clip-norm 0.0 --criterion "srn_loss" --num-workers 0 --seed 2 --save-interval-updates 500 --max-update 150000 --virtual-epoch-steps 5000 --save-interval 1 --half-voxel-size-at "5000,25000,75000" --reduce-step-size-at "5000,25000,75000" --pruning-every-steps 2500 --keep-interval-updates 5 --log-format simple --log-interval 1 --tensorboard-logdir checkpoint\Wineholder\tensorboard\nsvf_basev1 --save-dir checkpoint\Wineholder\nsvf_basev1
Wich give the call :
2020-10-26 17:11:18 | INFO | fairnr_cli.train | Namespace(L1=False, adam_betas='(0.9, 0.999)', adam_eps=1e-08, all_gather_list_size=16384, alpha_weight=1.0, arch='nsvf_base', background_depth=5.0, background_stop_gradient=True, best_checkpoint_metric='loss', bf16=False, bpe=None, broadcast_buffers=False, bucket_cap_mb=25, checkpoint_suffix='', chunk_size=64, clip_norm=0.0, color_weight=128.0, cpu=False, criterion='srn_loss', curriculum=0, data='C:\\Users\\Pascal\\git\\NSVF\\Synthetic_NSVF\\Wineholder', data_buffer_size=10, dataset_impl=None, ddp_backend='c10d', density_embed_dim=128, depth_weight=0.0, depth_weight_decay=None, deterministic_step=False, device_id=0, disable_validation=False, discrete_regularization=True, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, distributed_wrapper='DDP', empty_cache_freq=0, end_learning_rate=0.0, eval_lpips=False, fast_stat_sync=False, feature_embed_dim=256, feature_layers=1, find_unused_parameters=False, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, half_voxel_size_at='5000,25000,75000', initial_boundingbox='C:\\Users\\Pascal\\git\\NSVF\\Synthetic_NSVF\\Wineholder\\bbox.txt', inputs_to_density='emb:6:32', inputs_to_texture='feat:0:256, ray:4', keep_best_checkpoints=-1, keep_interval_updates=5, keep_last_epochs=-1, load_depth=False, load_mask=False, localsgd_frequency=3, log_format='simple', log_interval=1, lr=[0.001], lr_scheduler='polynomial_decay', max_epoch=0, max_hits=60, max_sentences=1, max_sentences_valid=1, max_tokens=None, max_tokens_valid=None, max_update=150000, maximize_best_checkpoint_metric=False, memory_efficient_bf16=False, memory_efficient_fp16=False, min_color=-1, min_loss_scale=0.0001, min_lr=-1, model_parallel_size=1, no_background_loss=False, no_epoch_checkpoints=False, no_last_checkpoints=False, no_load_binary=False, no_preload=True, no_progress_bar=False, no_sampling_at_reader=True, no_save=False, no_save_optimizer_state=False, no_seed_provided=False, nprocs_per_node=1, num_workers=0, object_id_path=None, optimizer='adam', optimizer_overrides='{}', output_valid=None, patience=-1, pixel_per_view=2048.0, power=1.0, profile=False, pruning_every_steps=2500, pruning_rerun_train_set=False, pruning_th=0.5, pruning_with_train_stats=False, quantization_config_path=None, raymarching_stepsize=0.01, raymarching_stepsize_ratio=0.125, raymarching_tolerance=0, reduce_step_size_at='5000,25000,75000', rendering_args=None, rendering_every_steps=None, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', sampling_at_center=1.0, sampling_on_bbox=False, sampling_on_mask=1.0, sampling_patch_size=1, sampling_skipping_size=1, save_dir='checkpoint\\Wineholder\\nsvf_basev1', save_interval=1, save_interval_updates=500, scoring='bleu', seed=2, sentence_avg=False, skip_invalid_size_inputs_valid_test=False, slowmo_algorithm='LocalSGD', slowmo_momentum=None, stop_time_hours=0, subsample_valid=-1, task='single_object_rendering', tensorboard_logdir='checkpoint\\Wineholder\\tensorboard\\nsvf_basev1', test_views='0', texture_embed_dim=256, texture_layers=3, threshold_loss_scale=None, tokenizer=None, total_num_update=150000, tpu=False, train_subset='train', train_views='0..100', transparent_background='1.0,1.0,1.0', update_freq=[1], use_bmuf=False, use_octree=True, use_old_adam=False, user_dir='fairnr', valid_chunk_size=64, valid_subset='valid', valid_view_per_batch=1, valid_view_resolution='800x800', valid_views='100..200', validate_after_updates=0, validate_interval=1, validate_interval_updates=0, vgg_level=2, vgg_weight=0.0, view_per_batch=2, view_resolution='800x800', virtual_epoch_steps=5000, voxel_embed_dim=32, voxel_path=None, voxel_size=0.25, warmup_updates=0, weight_decay=0.0)
It looks like raising errors from fairseq https://github.com/pytorch/fairseq Do you install it based on what I put in the requirements.txt? I did not use the up-to-date version to avoid big code changes.
It's the fairseq 0.9.0 from your line in requirements.txt
There is an issue like that in fairseq https://github.com/pytorch/fairseq/issues/2483
"The issue is as expected: default (non-portable) dtypes in FairseqDataset#ordered_indices"
I'll try the patch
@Pascal66 were you able to fix that last issue (regarding dtypes) with the aforementioned patch? I'm still getting the same error here
@mwalczyk Unfortunatly no, as @MultiPath say, he didnt use the uptodate version of fairseq, so I dont know well where to apply the patch
Is your feature request related to a problem? Please describe. I'm always frustrated when I cant try pytorch/cuda/python things under windows
Describe the solution you'd like what you want to happen : smooth install
Describe alternatives you've considered any alternative solutions conda or wsl ubuntu working way
Additional context wsl ubuntu windows 10 has problem with cuda conda windows 10 has problem with vc14++ compilation allmost on torchsearchsorted