Closed mike-w-wilson closed 3 years ago
Hi Mike and thanks for the clear message. The issue here is surely the small reference panel.
Try changing the following parameter in the config file: founders_ratios = [0.5, 0.45, 0.05] # was [0.8, 0.15, 0.05]
The pipeline wasn't designed for such small panel but let's see if this works and go from there.
Thanks for the help! The second simulation ran but I did hit a new error.
--------------------------------------------------------------------------------
----------------------------------- XGMix ------------------------------------
--------------------------------------------------------------------------------
When using this software, please cite:
Kumar, A., Montserrat, D.M., Bustamante, C. and Ioannidis, A.
"XGMix: Local-Ancestry Inference With Stacked XGBoost"
International Conference on Learning Representations Workshops
ICLR, 2020, Workshop AI4AH
https://www.biorxiv.org/content/10.1101/2020.04.21.053876v1
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Launching XGMix in train mode...
Reading sample maps and splitting in train/val...
path created: test_xgmix/generated_data/sample_maps
Running simulation...
Fast admix...
path created: test_xgmix/generated_data/chmchr20
path created: test_xgmix/generated_data/chmchr20/simulation_output
File read: 324568 SNPs for 20 individuals
path created: test_xgmix/generated_data/chmchr20/simulation_output/train1
Building founders
Simulating...
Simulating generation 2
Simulating generation 4
Simulating generation 6
Simulating generation 8
Simulating generation 12
Simulating generation 16
Simulating generation 24
Simulating generation 32
Simulating generation 48
Writing generation: 0
Writing generation: 2
Writing generation: 4
Writing generation: 6
Writing generation: 8
Writing generation: 12
Writing generation: 16
Writing generation: 24
Writing generation: 32
Writing generation: 48
path created: test_xgmix/generated_data/chmchr20/simulation_output/train2
Building founders
Simulating...
Simulating generation 2
Simulating generation 4
Simulating generation 6
Simulating generation 8
Simulating generation 12
Simulating generation 16
Simulating generation 24
Simulating generation 32
Simulating generation 48
Writing generation: 0
Writing generation: 2
Writing generation: 4
Writing generation: 6
Writing generation: 8
Writing generation: 12
Writing generation: 16
Writing generation: 24
Writing generation: 32
Writing generation: 48
Simulation done.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Preprocessing data...
Traceback (most recent call last):
File "XGMIX.py", line 368, in <module>
instance_name=instance_name, mode_filter_size=mode_filter_size, smooth_depth=smooth_depth)
File "XGMIX.py", line 264, in main
mode_filter_size, smooth_depth, gen_0, output_path)
File "XGMIX.py", line 121, in train
X_train1_raw, labels_train1_raw, X_train2_raw, labels_train2_raw, X_val_raw, labels_val_raw = [load_np_data(f) for f in train_val_files]
File "XGMIX.py", line 121, in <listcomp>
X_train1_raw, labels_train1_raw, X_train2_raw, labels_train2_raw, X_val_raw, labels_val_raw = [load_np_data(f) for f in train_val_files]
File "/XGMix/Utils/preprocess.py", line 17, in load_np_data
data.append(np.load(f).astype(np.int16))
File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 416, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'test_xgmix/generated_data/chmchr20/simulation_output/val/gen_2/mat_vcf_2d.npy'
This is going to be run on a much larger dataset so I can increase my test reference panel size. Do you have a suggestion on minimum size?
Hi again, yes this is basically the same error except now you're missing samples in the validation set. If you have a larger reference panel I would always suggest using it as it will significantly increase performance.
Having more than 50 samples will certainly get you out of these simple errors but again, I would suggest using the full reference panel you have available.
I no longer receive errors running with my full reference panel, thank you!
Hello,
I am running a small test on XGMIX.py and the tool is failing to find the "train2" directory after completing the training simulation. It appears that the tool never starts the train2 simulation.
Do you have any insight as to why the tool does not run the second training. I am testing a small 5MB portion of chr20 with 20 reference individuals. My input command is:
The same input works on RFMix2 if that helps determine why the tool does not produce the train2 directory. I appreciate your assistance!