MAGICS-LAB / DNABERT_2

[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Apache License 2.0
268 stars 60 forks source link

CompilationError: at 114:24: #19

Open QAQ1551QAQ opened 1 year ago

QAQ1551QAQ commented 1 year ago

Epoch [1/3]

KeyError Traceback (most recent call last) File :21, in _fwd_kernel(Q, K, V, Bias, Out, Lse, TMP, softmax_scale, stride_qb, stride_qh, stride_qm, stride_kb, stride_kh, stride_kn, stride_vb, stride_vh, stride_vn, stride_bb, stride_bh, stride_bm, stride_ob, stride_oh, stride_om, nheads, seqlen_q, seqlen_k, seqlen_q_rounded, headdim, CACHE_KEY_SEQLEN_Q, CACHE_KEY_SEQLEN_K, BIAS_TYPE, IS_CAUSAL, BLOCK_HEADDIM, EVEN_M, EVEN_N, EVEN_HEADDIM, BLOCK_M, BLOCK_N, grid, num_warps, num_stages, extern_libs, stream, warmup)

KeyError: ('2-.-0-.-0--d6252949da17ceb5f3a278a70250af13-3b85c7bef5f0a641282f3b73af50f599-14de7de5c4da5794c8ca14e7e41a122d-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.float16, torch.float16, torch.float16, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (False, False), (False, False)))

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:937, in build_triton_ir(fn, signature, specialization, constants) 936 try: --> 937 generator.visit(fn.parse()) 938 except Exception as e:

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node) 854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8 --> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node) 370 visitor = getattr(self, method, self.generic_visit) --> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:183, in CodeGenerator.visit_Module(self, node) 182 def visit_Module(self, node): --> 183 ast.NodeVisitor.generic_visit(self, node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:379, in NodeVisitor.generic_visit(self, node) 378 if isinstance(item, AST): --> 379 self.visit(item) 380 elif isinstance(value, AST):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node) 854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8 --> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node) 370 visitor = getattr(self, method, self.generic_visit) --> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:252, in CodeGenerator.visit_FunctionDef(self, node) 251 # visit function body --> 252 has_ret = self.visit_compound_statement(node.body) 253 # finalize function

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:177, in CodeGenerator.visit_compound_statement(self, stmts) 176 for stmt in stmts: --> 177 self.last_ret_type = self.visit(stmt) 178 if isinstance(stmt, ast.Return):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node) 854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8 --> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node) 370 visitor = getattr(self, method, self.generic_visit) --> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:678, in CodeGenerator.visit_For(self, node) 677 self.scf_stack.append(node) --> 678 self.visit_compound_statement(node.body) 679 self.scf_stack.pop()

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:177, in CodeGenerator.visit_compound_statement(self, stmts) 176 for stmt in stmts: --> 177 self.last_ret_type = self.visit(stmt) 178 if isinstance(stmt, ast.Return):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node) 854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8 --> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node) 370 visitor = getattr(self, method, self.generic_visit) --> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:319, in CodeGenerator.visit_AugAssign(self, node) 318 assign = ast.Assign(targets=[node.target], value=rhs) --> 319 self.visit(assign) 320 return self.get_value(name)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node) 854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8 --> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node) 370 visitor = getattr(self, method, self.generic_visit) --> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:301, in CodeGenerator.visit_Assign(self, node) 300 names = _names[0] --> 301 values = self.visit(node.value) 302 if not isinstance(names, tuple):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node) 854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8 --> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node) 370 visitor = getattr(self, method, self.generic_visit) --> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:339, in CodeGenerator.visit_BinOp(self, node) 338 lhs = self.visit(node.left) --> 339 rhs = self.visit(node.right) 340 fn = { 341 ast.Add: 'add', 342 ast.Sub: 'sub', (...) 352 ast.BitXor: 'xor', 353 }[type(node.op)]

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:855, in CodeGenerator.visit(self, node) 854 warnings.simplefilter("ignore", PendingDeprecationWarning) # python 3.8 --> 855 return super().visit(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/ast.py:371, in NodeVisitor.visit(self, node) 370 visitor = getattr(self, method, self.generic_visit) --> 371 return visitor(node)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:797, in CodeGenerator.visit_Call(self, node) 795 if (hasattr(fn, 'self') and self.is_triton_tensor(fn.self)) \ 796 or impl.is_builtin(fn): --> 797 return fn(*args, _builder=self.builder, **kws) 798 if fn in self.builtins.values():

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/impl/base.py:22, in builtin..wrapper(*args, *kwargs) 18 raise ValueError( 19 "Did you forget to add @triton.jit ? " 20 "(_builder argument must be provided outside of JIT functions.)" 21 ) ---> 22 return fn(args, **kwargs)

TypeError: dot() got an unexpected keyword argument 'trans_b'

The above exception was the direct cause of the following exception:

CompilationError Traceback (most recent call last) Cell In[15], line 1 ----> 1 teacher_train(T_model, cfg, train_loader, test_loader)

Cell In[14], line 39, in teacher_train(model, config, train_loader, test_loader) 37 mask = mask.to(config.device) 38 labels = labels.to(config.device) ---> 39 outputs = model(ids, mask) 40 model.zero_grad() 41 loss = F.cross_entropy(outputs, labels)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

Cell In[12], line 12, in BERT_Model.forward(self, context, mask) 11 def forward(self, context, mask): ---> 12 outputs = self.bert(context, attention_mask=mask) 13 pooled = outputs[1] 14 out = self.fc(pooled)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:608, in BertModel.forward(self, input_ids, token_type_ids, attention_mask, position_ids, output_all_encoded_layers, masked_tokens_mask, **kwargs) 605 first_col_mask[:, 0] = True 606 subset_mask = masked_tokens_mask | first_col_mask --> 608 encoder_outputs = self.encoder( 609 embedding_output, 610 attention_mask, 611 output_all_encoded_layers=output_all_encoded_layers, 612 subset_mask=subset_mask) 614 if masked_tokens_mask is None: 615 sequence_output = encoder_outputs[-1]

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:446, in BertEncoder.forward(self, hidden_states, attention_mask, output_all_encoded_layers, subset_mask) 444 if subset_mask is None: 445 for layer_module in self.layer: --> 446 hidden_states = layer_module(hidden_states, 447 cu_seqlens, 448 seqlen, 449 None, 450 indices, 451 attn_mask=attention_mask, 452 bias=alibi_attn_mask) 453 if output_all_encoded_layers: 454 all_encoder_layers.append(hidden_states)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:327, in BertLayer.forward(self, hidden_states, cu_seqlens, seqlen, subset_idx, indices, attn_mask, bias) 305 def forward( 306 self, 307 hidden_states: torch.Tensor, (...) 313 bias: Optional[torch.Tensor] = None, 314 ) -> torch.Tensor: 315 """Forward pass for a BERT layer, including both attention and MLP. 316 317 Args: (...) 325 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch) 326 """ --> 327 attention_output = self.attention(hidden_states, cu_seqlens, seqlen, 328 subset_idx, indices, attn_mask, bias) 329 layer_output = self.mlp(attention_output) 330 return layer_output

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:240, in BertUnpadAttention.forward(self, input_tensor, cu_seqlens, max_s, subset_idx, indices, attn_mask, bias) 218 def forward( 219 self, 220 input_tensor: torch.Tensor, (...) 226 bias: Optional[torch.Tensor] = None, 227 ) -> torch.Tensor: 228 """Forward pass for scaled self-attention without padding. 229 230 Arguments: (...) 238 bias: None or (batch, heads, max_seqlen_in_batch, max_seqlen_in_batch) 239 """ --> 240 self_output = self.self(input_tensor, cu_seqlens, max_s, indices, 241 attn_mask, bias) 242 if subset_idx is not None: 243 return self.output(index_first_axis(self_output, subset_idx), 244 index_first_axis(input_tensor, subset_idx))

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py:181, in BertUnpadSelfAttention.forward(self, hidden_states, cu_seqlens, max_seqlen_in_batch, indices, attn_mask, bias) 179 bias_dtype = bias.dtype 180 bias = bias.to(torch.float16) --> 181 attention = flash_attn_qkvpacked_func(qkv, bias) 182 attention = attention.to(orig_dtype) 183 bias = bias.to(bias_dtype)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/torch/autograd/function.py:506, in Function.apply(cls, *args, *kwargs) 503 if not torch._C._are_functorch_transforms_active(): 504 # See NOTE: [functorch vjp and autograd interaction] 505 args = _functorch.utils.unwrap_dead_wrappers(args) --> 506 return super().apply(args, **kwargs) # type: ignore[misc] 508 if cls.setup_context == _SingleLevelFunction.setup_context: 509 raise RuntimeError( 510 'In order to use an autograd.Function with functorch transforms ' 511 '(vmap, grad, jvp, jacrev, ...), it must override the setup_context ' 512 'staticmethod. For more details, please see ' 513 'https://pytorch.org/docs/master/notes/extending.func.html')

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:1021, in _FlashAttnQKVPackedFunc.forward(ctx, qkv, bias, causal, softmax_scale) 1019 if qkv.stride(-1) != 1: 1020 qkv = qkv.contiguous() -> 1021 o, lse, ctx.softmax_scale = _flash_attn_forward( 1022 qkv[:, :, 0], 1023 qkv[:, :, 1], 1024 qkv[:, :, 2], 1025 bias=bias, 1026 causal=causal, 1027 softmax_scale=softmax_scale) 1028 ctx.save_for_backward(qkv, o, lse, bias) 1029 ctx.causal = causal

File ~/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py:826, in _flash_attn_forward(q, k, v, bias, causal, softmax_scale) 823 # BLOCK = 128 824 # num_warps = 4 if d <= 64 else 8 825 grid = lambda META: (triton.cdiv(seqlen_q, META['BLOCK_M']), batch nheads) --> 826 _fwd_kernel[grid]( # type: ignore 827 q, 828 k, 829 v, 830 bias, 831 o, 832 lse, 833 tmp, 834 softmax_scale, 835 q.stride(0), 836 q.stride(2), 837 q.stride(1), 838 k.stride(0), 839 k.stride(2), 840 k.stride(1), 841 v.stride(0), 842 v.stride(2), 843 v.stride(1), 844 bias_strides, 845 o.stride(0), 846 o.stride(2), 847 o.stride(1), 848 nheads, 849 seqlen_q, 850 seqlen_k, 851 seqlen_q_rounded, 852 d, 853 seqlen_q // 32, 854 seqlen_k // 32, # key for triton cache (limit number of compilations) 855 # Can't use kwargs here because triton autotune expects key to be args, not kwargs 856 # IS_CAUSAL=causal, BLOCK_HEADDIM=d, 857 bias_type, 858 causal, 859 BLOCK_HEADDIM, 860 # BLOCK_M=BLOCK, BLOCK_N=BLOCK, 861 # num_warps=num_warps, 862 # num_stages=1, 863 ) 864 return o, lse, softmax_scale

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/runtime/autotuner.py:90, in Autotuner.run(self, *args, *kwargs) 88 if config.pre_hook is not None: 89 config.pre_hook(self.nargs) ---> 90 return self.fn.run(args, num_warps=config.num_warps, num_stages=config.num_stages, kwargs, config.kwargs)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/runtime/autotuner.py:199, in Heuristics.run(self, *args, kwargs) 197 for v, heur in self.values.items(): 198 kwargs[v] = heur({dict(zip(self.arg_names, args)), *kwargs}) --> 199 return self.fn.run(args, **kwargs)

File :41, in _fwd_kernel(Q, K, V, Bias, Out, Lse, TMP, softmax_scale, stride_qb, stride_qh, stride_qm, stride_kb, stride_kh, stride_kn, stride_vb, stride_vh, stride_vn, stride_bb, stride_bh, stride_bm, stride_ob, stride_oh, stride_om, nheads, seqlen_q, seqlen_k, seqlen_q_rounded, headdim, CACHE_KEY_SEQLEN_Q, CACHE_KEY_SEQLEN_K, BIAS_TYPE, IS_CAUSAL, BLOCK_HEADDIM, EVEN_M, EVEN_N, EVEN_HEADDIM, BLOCK_M, BLOCK_N, grid, num_warps, num_stages, extern_libs, stream, warmup)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:1621, in compile(fn, **kwargs) 1619 next_module = parse(path) 1620 else: -> 1621 next_module = compile(module) 1622 fn_cache_manager.put(next_module, f"{name}.{ir}") 1623 if os.path.exists(path):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:1550, in compile..(src) 1545 extern_libs = kwargs.get("extern_libs", dict()) 1546 # build compilation stages 1547 stages = { 1548 "ast": (lambda path: fn, None), 1549 "ttir": (lambda path: parse_mlir_module(path, context), -> 1550 lambda src: ast_to_ttir(src, signature, configs[0], constants)), 1551 "ttgir": (lambda path: parse_mlir_module(path, context), 1552 lambda src: ttir_to_ttgir(src, num_warps, num_stages, capability)), 1553 "llir": (lambda path: Path(path).read_text(), 1554 lambda src: ttgir_to_llir(src, extern_libs, capability)), 1555 "ptx": (lambda path: Path(path).read_text(), 1556 lambda src: llir_to_ptx(src, capability)), 1557 "cubin": (lambda path: Path(path).read_bytes(), 1558 lambda src: ptx_to_cubin(src, capability)) 1559 } 1560 # find out the signature of the function 1561 if isinstance(fn, triton.runtime.JITFunction):

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:962, in ast_to_ttir(fn, signature, specialization, constants) 961 def ast_tottir(fn, signature, specialization, constants): --> 962 mod, = build_triton_ir(fn, signature, specialization, constants) 963 return optimize_triton_ir(mod)

File ~/anaconda3/envs/pytorch_python38/lib/python3.8/site-packages/triton/compiler.py:942, in build_triton_ir(fn, signature, specialization, constants) 940 if node is None or isinstance(e, (NotImplementedError, CompilationError)): 941 raise e --> 942 raise CompilationError(fn.src, node) from e 943 ret = generator.module 944 # module takes ownership of the context

hosnaa commented 1 year ago

I have this issue too. I found this error TypeError: dot() got an unexpected keyword argument 'trans_b', thus removed this attribute from the code (not good practice though). It yielded another error which is in compatibility of shapes in dot product. Still trying to check it out and if any of the authors can refer us to the problem then please.

QAQ1551QAQ commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

Zhihan1996 commented 1 year ago

Hey,

Can you please try to install triton from source with:

git clone https://github.com/openai/triton.git;
cd triton/python;
pip install cmake; # build-time dependency
pip install -e .
hosnaa commented 1 year ago

When I tried this it yielded an error:

subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'TritonRelBuildWithAsserts', '-j64']' returned non-zero exit status 2.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building editable for triton
Failed to build triton
ERROR: Could not build wheels for triton, which is required to install pyproject.toml-based projects

The call stack is very long but this is a snippet of it.


********************************************************************************
              An error happened while installing `triton` in editable mode.

              The following steps are recommended to help debug this problem:

              - Try to install the project normally, without using the editable mode.
                Does the error still persist?
                (If it does, try fixing the problem before attempting the editable mode).
              - If you are using binary extensions, make sure you have all OS-level
                dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
              - Try the latest version of setuptools (maybe the error was already fixed).
              - If you (or your project dependencies) are using any setuptools extension
                or customization, make sure they support the editable mode.

              After following the steps above, if the problem still persists and
              you think this is related to how setuptools handles editable installations,
              please submit a reproducible example
              (see https://stackoverflow.com/help/minimal-reproducible-example) to:

                  https://github.com/pypa/setuptools/issues

              See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.```
raphaelmourad commented 1 year ago

Hi guys, I found the solution. I spent 5h on @it...

So the problem is that in the model "zhihan1996/DNABERT-2-117M" that we load using : AutoModel.from_pretrained("zhihan1996/DNABERT-2-117M", trust_remote_code=True) there is a script called flash_attn_triton.py (see https://huggingface.co/zhihan1996/DNABERT-2-117M/blob/main/flash_attn_triton.py) where the line : qk += tl.dot(q, k, trans_b=True) is no longer compatible with triton 2.0.1 (the one you get from pip install triton) or triton 2.1.0 (the one you get from git clone https://github.com/openai/triton.git). The function tl.dot() does not accept the parameter trans_b. See also: https://github.com/microsoft/DeepSpeed/issues/3491 The version of triton that is compatible is 2.0.0.dev20221202 (it has the parameter trans_b).

So you have to do: pip install triton==2.0.0.dev20221202 Unfortunately, this will install torch 1.13.1 instead of 2.

But that works fine after that :).

pjsample commented 1 year ago

@raphaelmourad were you able to run the finetuning scripts after installing that version of Triton? I attempted to but it encountered a new error with Triton.

@Zhihan1996 I previously recommended installing directly from the Triton Github repo which I thought solved the issue but upon further inspection it did not. Have you been able to run the scripts from a fresh install?

raphaelmourad commented 1 year ago

@pjsample I could not run the scripts as there were other bugs coming. I had to make my own script and it worked after a lot of modifications.

raphaelmourad commented 1 year ago

The problem here is that we need to have the right versions for all python modules to install, but they were not given. This is because for instance for triton module as it is updated it becomes not compatible with DNABERT2.

GriffithLin commented 1 year ago

@raphaelmourad Hey, could you provide the script that has run successfully?

raphaelmourad commented 1 year ago

@GriffithLin I made my own jupyter notebook as follows (I had to fix a lot of bugs):

LOAD PYTHON MODULES

Load basic modules

import os import sys import time from os import path import gc

Load data and machine learning modules

import numpy as np import pandas as pd

import torch import triton from transformers import AutoTokenizer, AutoModel from torch.utils.data import TensorDataset, DataLoader

Print numpy version for compatibility with spektral

print(np.version) # Becareful: numpy should be 1.19 (and not 1.2) for spektral to work! print(triton.version)

print(torch.cuda.get_device_name(0))

SET DIRECTORY

os.chdir("/media/mourad/SSD2/DataAugmentDL") print(os.getcwd())

 LOAD DNABERT MODULE

sys.path.append("/media/mourad/SSD2/DataAugmentDL/DNABERT2/DNABERT_2-main/finetune/") from train import *

model_args=ModelArguments() data_args=DataArguments() training_args=TrainingArguments

data_args.data_path="/media/mourad/SSD2/DataAugmentDL/DNABERT2/GUE/EMP/H3K4me1/" model_args.model_name_or_path="/media/mourad/SSD2/DataAugmentDL/DNABERT2/DNABERT-2-117M/"

training_args.deepspeed_plugin=None

training_args.log_level="info"

training_args.run_name="DNABERT2_aug" training_args.model_max_length=20 training_args.per_device_train_batch_size=32 training_args.per_device_eval_batch_size=16 training_args.gradient_accumulation_steps=1 training_args.learning_rate=3e-5 training_args.num_train_epochs=4 training_args.fp16=False training_args.save_steps=400 training_args.output_dir="results/DNABERT2/"+expe training_args.evaluation_strategy="steps" training_args.eval_steps=100 training_args.warmup_steps=50 training_args.logging_steps=100000 training_args.find_unused_parameters=False

Other arguments to add since it was bugging

training_args.device=torch.device('cuda:0') training_args.report_to=["tensorboard"] training_args.world_size=1 training_args.per_device_train_batch_size=8 training_args.train_batch_size=32 training_args.eval_batch_size=32 training_args.test_batch_size=32 training_args.batch_size=32 training_args.num_training_steps=100 training_args.n_gpu=1 training_args.distributed_state=None training_args.local_rank=-1

load tokenizer

tokenizer = transformers.AutoTokenizer.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, model_max_length=training_args.model_max_length, padding_side="right", use_fast=True, trust_remote_code=True, )

if "InstaDeepAI" in model_args.model_name_or_path: tokenizer.eos_token = tokenizer.pad_token

define datasets and data collator

train_dataset = SupervisedDataset(tokenizer=tokenizer, data_path=os.path.join(data_args.data_path, "train.csv"), kmer=data_args.kmer) val_dataset = SupervisedDataset(tokenizer=tokenizer, data_path=os.path.join(data_args.data_path, "dev.csv"), kmer=data_args.kmer) test_dataset = SupervisedDataset(tokenizer=tokenizer, data_path=os.path.join(data_args.data_path, "test.csv"), kmer=data_args.kmer) data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)

load model

model=transformers.AutoModelForSequenceClassification.from_pretrained( model_args.model_name_or_path, cache_dir=training_args.cache_dir, num_labels=train_dataset.num_labels, trust_remote_code=True, output_hidden_states=False, )

configure LoRA

if model_args.use_lora: lora_config = LoraConfig( r=model_args.lora_r, lora_alpha=model_args.lora_alpha, target_modules=list(model_args.lora_target_modules.split(",")), lora_dropout=model_args.lora_dropout, bias="none", task_type="SEQ_CLS", inference_mode=False, ) model = get_peft_model(model, lora_config) model.print_trainable_parameters()

define trainer

trainer = transformers.Trainer(model=model, tokenizer=tokenizer, args=training_args, compute_metrics=compute_metrics, train_dataset=train_dataset, eval_dataset=val_dataset, data_collator=data_collator) trainer.local_rank=training_args.local_rank trainer.train()

get the evaluation results from trainer

if training_args.eval_and_save_results: results_path = training_args.output_dir+"/"+augmentation+"/metrics" results = trainer.evaluate(eval_dataset=test_dataset) os.makedirs(results_path, exist_ok=True) with open(os.path.join(results_path, "test_results.json"), "w") as f: json.dump(results, f)

GriffithLin commented 1 year ago

@raphaelmourad thanks! Another question about environment. I have installed triton 2.0.0.dev20221202 and torch 1.13.1 .But when I run test code based on Quick Start , I have found this error RuntimeError: Triton requires CUDA 11.4+ (my cuda version is 11.7 which is satisfied )


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizersbefore the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid usingtokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
File "", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0--2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-42648570729a4835b21c1c18cebedbfe-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float16, torch.float16, torch.float16, torch.float16, torch.float16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('matrix', False, 64, False, False, True, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, False), (False, False), (False, False), (True, False), (True, False), (True, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 17, in
hidden_states = model(inputs)[0] # [1, sequence_length, 768]
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data3/linming/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 608, in forward
encoder_outputs = self.encoder(
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data3/linming/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 446, in forward
hidden_states = layer_module(hidden_states,
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data3/linming/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 327, in forward
attention_output = self.attention(hidden_states, cu_seqlens, seqlen,
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data3/linming/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 240, in forward
self_output = self.self(input_tensor, cu_seqlens, max_s, indices,
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/data3/linming/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/bert_layers.py", line 181, in forward
attention = flash_attn_qkvpacked_func(qkv, bias)
File "/data3/linming/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py", line 1021, in forward
o, lse, ctx.softmax_scale = _flash_attn_forward(
File "/data3/linming/.cache/huggingface/modules/transformers_modules/zhihan1996/DNABERT-2-117M/5fd206e1a13cee3ef4a608677312175eb6f8143d/flash_attn_triton.py", line 826, in _flash_attn_forward
_fwd_kernel[grid]( # type: ignore
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 86, in run
return self.fn.run(*args, num_warps=config.num_warps, num_stages=config.num_stages, **kwargs, **config.kwargs)
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/triton/runtime/autotuner.py", line 200, in run
return self.fn.run(*args, **kwargs)
File "", line 41, in _fwd_kernel
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 1256, in compile
asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages,
File "/data3/linming/.conda/envs/dna/lib/python3.8/site-packages/triton/compiler.py", line 901, in _compile
name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: Triton requires CUDA 11.4+

Here is my environment

# packages in environment at /data3/linming/.conda/envs/dna:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
accelerate                0.19.0                   pypi_0    pypi
aiohttp                   3.8.4                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
antlr4-python3-runtime    4.9.3                    pypi_0    pypi
async-timeout             4.0.2                    pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
ca-certificates           2023.05.30           h06a4308_0    defaults
certifi                   2023.5.7                 pypi_0    pypi
charset-normalizer        3.1.0                    pypi_0    pypi
cmake                     3.26.3                   pypi_0    pypi
datasets                  2.12.0                   pypi_0    pypi
dill                      0.3.6                    pypi_0    pypi
einops                    0.6.1                    pypi_0    pypi
evaluate                  0.4.0                    pypi_0    pypi
filelock                  3.12.0                   pypi_0    pypi
frozenlist                1.3.3                    pypi_0    pypi
fsspec                    2023.5.0                 pypi_0    pypi
huggingface-hub           0.14.1                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
jinja2                    3.1.2                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
lit                       16.0.6                   pypi_0    pypi
markupsafe                2.1.2                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.4                    pypi_0    pypi
multiprocess              0.70.14                  pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    defaults
networkx                  3.1                      pypi_0    pypi
numpy                     1.24.4                   pypi_0    pypi
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
nvidia-curand-cu11        10.2.10.91               pypi_0    pypi
nvidia-cusolver-cu11      11.4.0.1                 pypi_0    pypi
nvidia-cusparse-cu11      11.7.4.91                pypi_0    pypi
nvidia-nccl-cu11          2.14.3                   pypi_0    pypi
nvidia-nvtx-cu11          11.7.91                  pypi_0    pypi
omegaconf                 2.3.0                    pypi_0    pypi
openssl                   3.0.9                h7f8727e_0    defaults
packaging                 23.1                     pypi_0    pypi
pandas                    2.0.3                    pypi_0    pypi
peft                      0.3.0                    pypi_0    pypi
pillow                    9.5.0                    pypi_0    pypi
pip                       23.1.2           py38h06a4308_0    defaults
psutil                    5.9.5                    pypi_0    pypi
pyarrow                   12.0.0                   pypi_0    pypi
python                    3.8.17               h955ad1f_0    defaults
python-dateutil           2.8.2                    pypi_0    pypi
pytz                      2023.3                   pypi_0    pypi
pyyaml                    6.0                      pypi_0    pypi
readline                  8.2                  h5eee18b_0    defaults
regex                     2023.5.5                 pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
responses                 0.18.0                   pypi_0    pypi
safetensors               0.3.1                    pypi_0    pypi
setuptools                67.8.0           py38h06a4308_0    defaults
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    defaults
sympy                     1.12                     pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    defaults
tokenizers                0.13.3                   pypi_0    pypi
torch                     1.13.1                   pypi_0    pypi
torchaudio                2.0.2                    pypi_0    pypi
torchvision               0.15.2                   pypi_0    pypi
tqdm                      4.65.0                   pypi_0    pypi
transformers              4.30.2                   pypi_0    pypi
triton                    2.0.0.dev20221202          pypi_0    pypi
typing-extensions         4.7.0                    pypi_0    pypi
tzdata                    2023.3                   pypi_0    pypi
urllib3                   2.0.3                    pypi_0    pypi
wheel                     0.38.4           py38h06a4308_0    defaults
xxhash                    3.2.0                    pypi_0    pypi
xz                        5.4.2                h5eee18b_0    defaults
yarl                      1.9.2                    pypi_0    pypi
zlib                      1.2.13               h5eee18b_0    defaults
raphaelmourad commented 1 year ago

@GriffithLin I have installed CUDA 12.2. Driver: NVIDIA-SMI 535.54.03. GPU: RTX3090 24Gb. Now you know all my config ;).

Package Version


absl-py 1.4.0 accelerate 0.21.0 aiohttp 3.8.4 aiosignal 1.3.1 asttokens 2.2.1 async-timeout 4.0.2 attrs 23.1.0 backcall 0.2.0 backports.functools-lru-cache 1.6.5 bio 1.5.9 biopython 1.81 biothings-client 0.3.0 Brotli 1.0.9 cachetools 5.3.1 certifi 2023.5.7 charset-normalizer 3.2.0 click 8.1.5 cmake 3.26.4 colorama 0.4.6 comm 0.1.3 contourpy 1.1.0 cycler 0.11.0 dataclasses 0.8 datasets 2.13.1 debugpy 1.6.7 decorator 5.1.1 dill 0.3.6 einops 0.6.1 executing 1.2.0 fairscale 0.4.13 filelock 3.12.2 fonttools 4.41.0 frozenlist 1.4.0 fsspec 2023.6.0 google-auth 2.22.0 google-auth-oauthlib 1.0.0 gprofiler-official 1.0.0 grpcio 1.56.0 h5py 3.9.0 huggingface-hub 0.16.4 idna 3.4 importlib-metadata 6.8.0 importlib-resources 6.0.0 IProgress 0.4 ipykernel 6.24.0 ipython 8.12.0 ipywidgets 8.0.7 jedi 0.18.2 Jinja2 3.1.2 joblib 1.3.0 jupyter_client 8.3.0 jupyter_core 4.12.0 jupyterlab-widgets 3.0.8 kiwisolver 1.4.4 lit 16.0.6 Markdown 3.4.3 MarkupSafe 2.1.3 matplotlib 3.7.2 matplotlib-inline 0.1.6 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.14 mygene 3.2.2 mypy-extensions 1.0.0 nest-asyncio 1.5.6 networkx 3.1 numpy 1.24.4 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 oauthlib 3.2.2 packaging 23.1 pandas 2.0.3 parso 0.8.3 peft 0.4.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 10.0.0 pip 23.2 platformdirs 3.9.1 pooch 1.7.0 progressbar 2.5 prompt-toolkit 3.0.39 protobuf 4.23.4 psutil 5.9.5 ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 12.0.1 pyasn1 0.5.0 pyasn1-modules 0.3.0 Pygments 2.15.1 pyparsing 3.0.9 pyre-extensions 0.0.30 PySocks 1.7.1 python-dateutil 2.8.2 pytz 2023.3 PyYAML 6.0.1 pyzmq 25.1.0 regex 2023.6.3 requests 2.31.0 requests-oauthlib 1.3.1 responses 0.18.0 rsa 4.9 sacremoses 0.0.53 safetensors 0.3.1 scikit-learn 1.3.0 scipy 1.10.1 setuptools 68.0.0 six 1.16.0 stack-data 0.6.2 sympy 1.12 tensorboard 2.13.0 tensorboard-data-server 0.7.1 threadpoolctl 3.2.0 tokenizers 0.13.3 torch 1.13.1 torchbearer 0.5.3 torcheval 0.0.6 torchtnt 0.1.0 tornado 6.3.2 tqdm 4.65.0 traitlets 5.9.0 transformers 4.30.2 triton 2.0.0.dev20221202 typing_extensions 4.7.1 typing-inspect 0.9.0 tzdata 2023.3 urllib3 1.26.16 wcwidth 0.2.6 Werkzeug 2.3.6 wheel 0.40.0 widgetsnbextension 4.0.8 xxhash 0.0.0 yarl 1.9.2 zipp 3.16.2

raphaelmourad commented 1 year ago

@GriffithLin also in the module train.py (folder "finetune"), I changed this (I modified functions get_process_log_level() and get_warmup_steps() as there were bugs:

@dataclass class TrainingArguments(transformers.TrainingArguments):

class TrainingArguments():

cache_dir: Optional[str] = field(default=None)
run_name: str = field(default="run")
optim: str = field(default="adamw_torch")
model_max_length: int = field(default=512, metadata={"help": "Maximum sequence length."})
gradient_accumulation_steps: int = field(default=1)
per_device_train_batch_size: int = field(default=1)
per_device_eval_batch_size: int = field(default=1)
num_train_epochs: int = field(default=1)
fp16: bool = field(default=False)
logging_steps: int = field(default=100)
log_level: str = field(default="info")
save_steps: int = field(default=100)
eval_steps: int = field(default=100)
evaluation_strategy: str = field(default="steps"),
warmup_steps: int = field(default=50)
weight_decay: float = field(default=0.01)
learning_rate: float = field(default=1e-4)
save_total_limit: int = field(default=3)
load_best_model_at_end: bool = field(default=True)
output_dir: str = field(default="output")
find_unused_parameters: bool = field(default=False)
checkpointing: bool = field(default=False)
dataloader_pin_memory: bool = field(default=False)
eval_and_save_results: bool = field(default=True)
save_model: bool = field(default=False)
seed: int = field(default=42)

def get_process_log_level():
    return 10

def get_warmup_steps(num_training_steps):
    return 8
wzy-Sarah commented 1 year ago

@raphaelmourad thanks! Another question about environment. I have installed triton 2.0.0.dev20221202 and torch 1.13.1 .But when I run test code based on Quick Start , I have found this error RuntimeError: Triton requires CUDA 11.4+ (my cuda version is 11.7 which is satisfied )

I have the same problem as you, have you tackled this problem?

QAQ1551QAQ commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

raphaelmourad commented 1 year ago

@raphaelmourad thanks! Another question about environment. I have installed triton 2.0.0.dev20221202 and torch 1.13.1 .But when I run test code based on Quick Start , I have found this error RuntimeError: Triton requires CUDA 11.4+ (my cuda version is 11.7 which is satisfied )

I have the same problem as you, have you tackled this problem?

@wzy-Sarah I have CUDA Version: 12.2.

wzy-Sarah commented 1 year ago

@raphaelmourad thanks! Another question about environment. I have installed triton 2.0.0.dev20221202 and torch 1.13.1 .But when I run test code based on Quick Start , I have found this error RuntimeError: Triton requires CUDA 11.4+ (my cuda version is 11.7 which is satisfied )

I have the same problem as you, have you tackled this problem?

@wzy-Sarah I have CUDA Version: 12.2.

Can it work by reducing the torch version?

philippbayer commented 1 year ago

I think I solved it on my system. I have a NVIDIA A100, nvidia-smi reports Driver Version: 535.104.05 CUDA Version: 12.2. Same error about triton wanting CUDA 11+.

Made a new environment:

mamba create -n dna python=3.8
conda activate dna

Then I forced the torch CUDA version:

pip install torch==1.13.1+cu117  --extra-index-url https://download.pytorch.org/whl/cu117

Then I installed the required packages via this requirements.txt (not pulling/installing triton from github):

triton==2.0.0.dev20221202
transformers==4.29.2
scikit-learn
peft
einops

Finally, I had to install a CUDA 11 nvcc in the conda environment, I believe triton gets confused by the system-wide CUDA 12 nvcc binary.

 mamba install -c "nvidia/label/cuda-11.7.0" cuda-nvcc

At least the example data works now :)

Command:

export DATA_PATH=`pwd`/DNABERT_2/sample_data
export LR=3e-5
export MAX_LENGTH=100

python DNABERT_2/finetune/train.py \
    --model_name_or_path zhihan1996/DNABERT-2-117M \
    --data_path  ${DATA_PATH} \
    --kmer -1 \
    --run_name DNABERT2_${DATA_PATH} \
    --model_max_length ${MAX_LENGTH} \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --learning_rate ${LR} \
    --num_train_epochs 5 \
    --fp16 \
    --save_steps 200 \
    --output_dir output/dnabert2 \
    --evaluation_strategy steps \
    --eval_steps 200 \
    --warmup_steps 50 \
    --logging_steps 100 \
    --overwrite_output_dir True \
    --log_level info \
    --find_unused_parameters False

nvidia-smi reports Python using the GPU.

Click for full log
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
WARNING:root:Perform single sequence classification...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Some weights of the model checkpoint at zhihan1996/DNABERT-2-117M were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at zhihan1996/DNABERT-2-117M and are newly initialized: ['bert.pooler.dense.weight', 'classifier.bias', 'bert.pooler.dense.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using cuda_amp half precision backend
***** Running training *****
  Num examples = 15
  Num Epochs = 5
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 10
  Number of trainable parameters = 117,070,082
  0%|                                                                                                                                                 | 0/10 [00:00
QAQ1551QAQ commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

algaebrown commented 1 year ago

I think I solved it on my system. I have a NVIDIA A100, nvidia-smi reports Driver Version: 535.104.05 CUDA Version: 12.2. Same error about triton wanting CUDA 11+.

Made a new environment:

mamba create -n dna python=3.8
conda activate dna

Then I forced the torch CUDA version:

pip install torch==1.13.1+cu117  --extra-index-url https://download.pytorch.org/whl/cu117

Then I installed the required packages via this requirements.txt (not pulling/installing triton from github):

triton==2.0.0.dev20221202 transformers==4.29.2 scikit-learn peft einops

Finally, I had to install a CUDA 11 nvcc in the conda environment, I believe triton gets confused by the system-wide CUDA 12 nvcc binary.

 mamba install -c "nvidia/label/cuda-11.7.0" cuda-nvcc

At least the example data works now :)

Command:

export DATA_PATH=pwd/DNABERT_2/sample_data export LR=3e-5 export MAX_LENGTH=100

python DNABERT_2/finetune/train.py \ --model_name_or_path zhihan1996/DNABERT-2-117M \ --data_path ${DATA_PATH} \ --kmer -1 \ --runname DNABERT2${DATA_PATH} \ --model_max_length ${MAX_LENGTH} \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 16 \ --gradient_accumulation_steps 1 \ --learning_rate ${LR} \ --num_train_epochs 5 \ --fp16 \ --save_steps 200 \ --output_dir output/dnabert2 \ --evaluation_strategy steps \ --eval_steps 200 \ --warmup_steps 50 \ --logging_steps 100 \ --overwrite_output_dir True \ --log_level info \ --find_unused_parameters False

nvidia-smi reports Python using the GPU. Click for full log

I tried this but had this error assert q.is_cuda and k.is_cuda and v.is_cuda.

QAQ1551QAQ commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。

algaebrown commented 1 year ago

I also tried compiling from source and got the autotune problem module 'triton' has no attribute 'autotune'

HelloWorldLTY commented 4 months ago

Hi, I met the same error:

subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'TritonRelBuildWithAsserts', '-j64']' returned non-zero exit status 1.

Are there any efficient solutions?

QAQ1551QAQ commented 4 months ago

这是来自QQ邮箱的假期自动回复邮件。   您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。